Archive for location-scale parameterisation

weakly informative reparameterisations

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on February 14, 2018 by xi'an

Our paper, weakly informative reparameterisations of location-scale mixtures, with Kaniav Kamary and Kate Lee, got accepted by JCGS! Great news, which comes in perfect timing for Kaniav as she is currently applying for positions. The paper proposes a unidimensional mixture Bayesian modelling based on the first and second moment constraints, since these turn the remainder of the parameter space into a compact. While we had already developed an associated R package, Ultimixt, the current editorial policy of JCGS imposes the R code used to produce all results to be attached to the submission and it took us a few more weeks than it should have to produce a directly executable code, due to internal library incompatibilities. (For this entry, I was looking for a link to our special JCGS issue with my picture of Edinburgh but realised I did not have this picture.)

mixtures are slices of an orange

Posted in Kids, R, Statistics with tags , , , , , , , , , , , , , , , , on January 11, 2016 by xi'an

licenceDataTempering_mu_pAfter presenting this work in both London and Lenzerheide, Kaniav Kamary, Kate Lee and I arXived and submitted our paper on a new parametrisation of location-scale mixtures. Although it took a long while to finalise the paper, given that we came with the original and central idea about a year ago, I remain quite excited by this new representation of mixtures, because the use of a global location-scale (hyper-)parameter doubling as the mean-standard deviation for the mixture itself implies that all the other parameters of this mixture model [beside the weights] belong to the intersection of a unit hypersphere with an hyperplane. [Hence the title above I regretted not using for the poster at MCMskv!]fitted_density_galaxy_data_500iters2This realisation that using a (meaningful) hyperparameter (μ,σ) leads to a compact parameter space for the component parameters is important for inference in such mixture models in that the hyperparameter (μ,σ) is easily estimated from the entire sample, while the other parameters can be studied using a non-informative prior like the Uniform prior on the ensuing compact space. This non-informative prior for mixtures is something I have been seeking for many years, hence my on-going excitement! In the mid-1990‘s, we looked at a Russian doll type parametrisation with Kerrie Mengersen that used the “first” component as defining the location-scale reference for the entire mixture. And expressing each new component as a local perturbation of the previous one. While this is a similar idea than the current one, it falls short of leading to a natural non-informative prior, forcing us to devise a proper prior on the variance that was a mixture of a Uniform U(0,1) and of an inverse Uniform 1/U(0,1). Because of the lack of compactness of the parameter space. Here, fixing both mean and variance (or even just the variance) binds the mixture parameter to an ellipse conditional on the weights. A space that can be turned into the unit sphere via a natural reparameterisation. Furthermore, the intersection with the hyperplane leads to a closed form spherical reparameterisation. Yay!

While I do not wish to get into the debate about the [non-]existence of “non-informative” priors at this stage, I think being able to using the invariant reference prior π(μ,σ)=1/σ is quite neat here because the inference on the mixture parameters should be location and scale equivariant. The choice of the prior on the remaining parameters is of lesser importance, the Uniform over the compact being one example, although we did not study in depth this impact, being satisfied with the outputs produced from the default (Uniform) choice.

From a computational perspective, the new parametrisation can be easily turned into the old parametrisation, hence leads to a closed-form likelihood. This implies a Metropolis-within-Gibbs strategy can be easily implemented, as we did in the derived Ultimixt R package. (Which programming I was not involved in, solely suggesting the name Ultimixt from ultimate mixture parametrisation, a former title that we eventually dropped off for the paper.)

Discussing the paper at MCMskv was very helpful in that I got very positive feedback about the approach and superior arguments to justify the approach and its appeal. And to think about several extensions outside location scale families, if not in higher dimensions which remain a practical challenge (in the sense of designing a parametrisation of the covariance matrices in terms of the global covariance matrix).

mixtures as exponential families

Posted in Kids, Statistics with tags , , , , , , , on December 8, 2015 by xi'an

Something I had not realised earlier and that came to me when answering a question on X validated about the scale parameter of a Gamma distribution. Following an earlier characterisation by Dennis Lindley, Ferguson has written a famous paper characterising location, scale and location-scale families within exponential families. For instance, a one-parameter location exponential family is necessarily the logarithm of a power of a Gamma distribution. What I found surprising is the equivalent for one-parameter scale exponential families: they are necessarily mixtures of positive and negative powers of Gamma distributions. This is surprising because a mixture does not seem to fit within the exponential family representation… I first thought Ferguson was using a different type of mixtures. Or of exponential family. But, after checking the details, it appears that the mixture involves a component on ℜ⁺ and another component on ℜ⁻ with a potential third component as a Dirac mass at zero. Hence, it only nominally writes as a mixture and does not offer the same challenges as a regular mixture. Not label switching. No latent variable. Having mutually exclusive supports solves all those problems and even allows for an indicator function to permeate through the exponential function… (Recall that the special mixture processed in Rubio and Steel also enjoys this feature.)

no country for odd means

Posted in Books, Kids, Statistics, University life with tags , , , , , , on November 16, 2015 by xi'an

This morning, Clara Grazian and I arXived a paper about Jeffreys priors for mixtures. This is a part of Clara’s PhD dissertation between Roma and Paris, on which she has worked for the past year. Jeffreys priors cannot be computed analytically for mixtures, which is such a drag that it led us to devise the delayed acceptance algorithm. However, the main message from this detailed study of Jeffreys priors is that they mostly do not work for Gaussian mixture models, in that the posterior is almost invariably improper! This is a definite death knell for Jeffreys priors in this setting, meaning that alternative reference priors, like the one we advocated with Kerrie Mengersen and Mike Titterington, or the similar solution in Roeder and Wasserman, have to be used. [Disclaimer: the title has little to do with the paper, except that posterior means are off for mixtures…]