## noninformative priors for mixtures

“A novel formulation of the mixture model is introduced, which includes the prior constraint that each Gaussian component is always assigned a minimal number of data points. This enables noninformative improper priors such as the Jeffreys prior to be placed on the component parameters. We demonstrate difficulties involved in specifying a prior for the standard Gaussian mixture model, and show how the new model can be used to overcome these. MCMC methods are given for efficient sampling from the posterior of this model.”C. Stoneking

**F**ollowing in the theme of the Jeffreys’ post of two weeks ago, I spotted today a newly arXived paper about using improper priors for mixtures…and surviving it! It is entitled “Bayesian inference of Gaussian mixture models with noninformative priors” and written by Colin Stoneking at ETH Zürich. As mentioned in the previous post, one specificity of our 1990-1994 paper on mixture with Jean Diebolt was to allow for improper priors by imposing at least two observations per component. The above abstract thus puzzled me until I found on page 3 that the paper was indeed related to ours (and Larry’s 2000 validation)! Actually, I should not complain about citations of my earlier works on mixtures as they cover seven different papers, but the bibliography is somewhat missing the paper we wrote with George Casella and Marty Wells in *Statistical Methodology* in 2004 (this was actually the very first paper of this new journal!), where we show that conjugate priors allow for the integration of the weights, resulting in a close-form expression for the distribution of the partition vector. (This was also extended in the chapter “Exact Bayesian Analysis of Mixtures” I wrote with Kerrie Mengersen in our book Mixtures: Estimation and Applications.)

“There is no well-founded, general method to choose the parameters of a given prior to make it weakly informative for Gaussian mixtures.”C. Stoneking

**T**he first part of the paper shows why looking for weakly informative priors is doomed to fail in this mixture setting: there is no stabilisation as hyperparameters get towards the border (between proper-ness and improper-ness), and on the opposite the frequency of appearances of empty components grows steadily to 100%… The second part gets to the reassessment of our 1990 exclusion trick, first considering that it is not producing a true posterior, then criticising Larry’s 2000 analysis as building a data-dependent “prior”, and at last proposing a reformulation where the exclusion of the empty components and those with one allocated observation becomes part of the “prior” (albeit a prior on the allocation vector). In fine, the posterior thus constructed remains the same as ours, with a message that if we start our model as the likelihood *of the sample* excluding empty or single-observation terms, we can produce a proper Bayesian analysis. (Except for a missing if minor renormalisation.) This leads me to wonder about the conclusion that inference about the (unknown) number of components in the mixture being impossible from this perspective. For instance, we could define fractional Bayes factors à la O’Hagan (1995) this way, i.e. starting from the restricted likelihood and taking a fraction of the likelihood to make the posterior proper, then using the remaining fraction to compute a Bayes factor. (Fractional Bayes factors do not work for the regular likelihood of a Gaussian mixture, irrespective of the sample size.)

April 7, 2016 at 2:35 am

[…] in the sense of ordinal data and of zero-inflated and over-dispersed models, rather than in Gaussian mixture models . Maybe because Stan cannot handle discrete missing variables. (Chapter 14 deals with continuous […]