## likelihood inference with no MLE

Posted in Books, R, Statistics with tags , , , , on July 29, 2021 by xi'an

“In a regular full discrete exponential family, the MLE for the canonical parameter does not exist when the observed value of the canonical statistic lies on the boundary of its convex support.”

Daniel Eck and Charlie Geyer just published an interesting and intriguing paper on running efficient inference for discrete exponential families when the MLE does not exist.  As for instance in the case of a complete separation between 0’s and 1’s in a logistic regression model. Or more generally, when the estimated Fisher information matrix is singular. Not mentioning the Bayesian version, which remains a form of likelihood inference. The construction is based on a MLE that exists on an extended model, a notion which I had not heard previously. This model is defined as a limit of likelihood values $\lim_{n\to\infty} \ell(\theta_n|x) = \sup_\theta \ell(\theta|x) := h(x)$

called the MLE distribution. Which remains a mystery to me, to some extent. Especially when this distribution is completely degenerate. Examples provided within the paper alas do not help, as they mostly serve as illustration for the associated rcdd R package. Intriguing, indeed!

## conjugate priors and sufficient statistics

Posted in Statistics with tags , , , , , on March 29, 2021 by xi'an

An X validated question rekindled my interest in the connection between sufficiency and conjugacy, by asking whether or not there was an equivalence between the existence of a (finite dimension) conjugate family of priors and the existence of a fixed (in n, the sample size) dimension sufficient statistic. Outside exponential families, meaning that the support of the sampling distribution need vary with the parameter.

While the existence of a sufficient statistic T of fixed dimension d whatever the (large enough) sample size n seems to clearly imply the existence of a (finite dimension) conjugate family of priors, or rather of a family associated with each possible dominating (prior) measure, $\mathfrak F=\{ \tilde \pi(\theta)\propto \tilde {f_n}(t_n(x_{1:n})|\theta) \pi_0(\theta)\,;\ n\in \mathbb N, x_{1:n}\in\mathfrak X^n\}$

the reverse statement is a wee bit more delicate to prove, due to the varying supports of the sampling or prior distributions. Unless some conjugate prior in the assumed family has an unrestricted support, the argument seems to limit sufficiency to a particular subset of the parameter set. I think that the result remains correct in general but could not rigorously wrap up the proof

## factorisation theorem on densities

Posted in Statistics with tags , , , , , , on December 23, 2020 by xi'an Another occurrence, while building my final math stat exam for my (quarantined!) third year students, of a question on X validated that led me to write down more precisely an argument for the decomposition of densities in exponential families. Albeit the decomposition is somewhat moot (and lost on the initiator of the question since this person later posted an answer ignoring measures), as it all depends on the choice of the dominating measures over X, T(X), and the slices {x; T(x)=t}. The fact that the slice does depend on t requires the measure to accept a potential dependence on t, in which case the conditional density wrt this measure can as well be constant.

## double if not exponential

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 10, 2020 by xi'an In one of my last quizzes for the year, as the course is about to finish, I asked whether mean or median was the MLE for a double exponential sample of odd size, without checking for the derivation of the result, as I was under the impression it was a straightforward result. Despite being outside exponential families. As my students found it impossible to solve within the allocated 5 minutes, I had a look, could not find an immediate argument (!), and used instead this nice American Statistician note by Robert Norton based on the derivative being the number of observations smaller than θ minus the number of observations larger than θ.  This leads to the result as well as the useful counter-example of a range of MLE solutions when the number of observations is even.

## arbitrary non-constant function [nonsensical]

Posted in Statistics with tags , , , , , , , , , , , on November 6, 2020 by xi'an