## set-valued sufficient statistic

Posted in Books, Kids, Statistics with tags , , , , , , on June 18, 2022 by xi'an

While the classical definition of a statistic is one of a real valued random variable or vector, less usual situations call for broader definitions… For instance, in an homework problem from Mark Schervish’s Theory of Statistics, a sample from the uniform distribution of a ball of unknown centre θ and radius ς is associated with the convex hull of said sample as “sufficient statistic”, albeit the object being a set. Similarly, if the radius ς is known, the set made of the intersection of all the balls of radius ς centred at the observations is sufficient, in that the likelihood is constant for θ inside and zero outside. As discussed in this X validated question, this does not define an optimal estimator of the center θ, while Pitman’s best location equivariant does, while the centre of this sufficient set, but it is not sufficient as a statistic and is not necessarily the MVUE, if unbiased.

## conditioning an algorithm

Posted in Statistics with tags , , , , , , , , , , , on June 25, 2021 by xi'an

A question of interest on X validated: given a (possibly black-box) algorithm simulating from a joint distribution with density [wrt a continuous measure] p(z,y) (how) is it possible to simulate from the conditional p(y|z⁰)? Which reminded me of a recent paper by Lindqvist et al. on conditional Monte Carlo. Which zooms on the simulation of a sample X given the value of a sufficient statistic, T(X)=t, revolving about pivotal quantities and inversions à la fiducial statistics, following an earlier Biometrika paper by Lindqvist & Taraldsen, in 2005. The idea is to write

$X=\chi(U,\theta)\qquad T(X)=\tau(U,\theta)$

where U has a distribution that depends on θ, to solve τ(u,θ)=t in θ for a given pair (u,t) with solution θ(u,t) and to generate u conditional on this solution. But this requires getting “under the hood” of the algorithm to such an extent as not answering the original question, or being open to other solutions using the expression for the joint density p(z,y)… In a purely black box situation, ABC appears as the natural if approximate solution.

## conjugate priors and sufficient statistics

Posted in Statistics with tags , , , , , on March 29, 2021 by xi'an

An X validated question rekindled my interest in the connection between sufficiency and conjugacy, by asking whether or not there was an equivalence between the existence of a (finite dimension) conjugate family of priors and the existence of a fixed (in n, the sample size) dimension sufficient statistic. Outside exponential families, meaning that the support of the sampling distribution need vary with the parameter.

While the existence of a sufficient statistic T of fixed dimension d whatever the (large enough) sample size n seems to clearly imply the existence of a (finite dimension) conjugate family of priors, or rather of a family associated with each possible dominating (prior) measure,

$\mathfrak F=\{ \tilde \pi(\theta)\propto \tilde {f_n}(t_n(x_{1:n})|\theta) \pi_0(\theta)\,;\ n\in \mathbb N, x_{1:n}\in\mathfrak X^n\}$

the reverse statement is a wee bit more delicate to prove, due to the varying supports of the sampling or prior distributions. Unless some conjugate prior in the assumed family has an unrestricted support, the argument seems to limit sufficiency to a particular subset of the parameter set. I think that the result remains correct in general but could not rigorously wrap up the proof

## a most unusual definition of sufficiency

Posted in Books, Kids, Statistics with tags , , , on January 13, 2021 by xi'an

A most unusual definition (?) of sufficiency came up on X validated this morn, as stated in Koller and Friedman’s Probabilistic Graphical Models. But as reported, it is quite restrictive, apparently limited to the natural statistic of an exponential family with conditionally Uniform ancillary (since the likelihood functions are equal rather than proportional). Even more strangely, with this formulation, the Normal sample size n [typo on the last line of the question] appears as a component of the sufficient statistic (Example 17.4). While not being random.

## factorisation theorem on densities

Posted in Statistics with tags , , , , , , on December 23, 2020 by xi'an

Another occurrence, while building my final math stat exam for my (quarantined!) third year students, of a question on X validated that led me to write down more precisely an argument for the decomposition of densities in exponential families. Albeit the decomposition is somewhat moot (and lost on the initiator of the question since this person later posted an answer ignoring measures), as it all depends on the choice of the dominating measures over X, T(X), and the slices {x; T(x)=t}. The fact that the slice does depend on t requires the measure to accept a potential dependence on t, in which case the conditional density wrt this measure can as well be constant.