Archive for mixtures

my life as a mixture [BAYSM 2014, Wien]

Posted in Books, Kids, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on September 12, 2014 by xi'an

Next week I am giving a talk at BAYSM in Vienna. BAYSM is the Bayesian Young Statisticians meeting so one may wonder why, but with Chris Holmes and Mike West, we got invited as more… erm… senior speakers! So I decided to give a definitely senior talk on a thread pursued throughout my career so far, namely mixtures. Plus it also relates to works of the other senior speakers. Here is the abstract for the talk:

Mixtures of distributions are fascinating objects for statisticians in that they both constitute a straightforward extension of standard distributions and offer a complex benchmark for evaluating statistical procedures, with a likelihood both computable in a linear time and enjoying an exponential number of local models (and sometimes infinite modes). This fruitful playground appeals in particular to Bayesians as it constitutes an easily understood challenge to the use of improper priors and of objective Bayes solutions. This talk will review some ancient and some more recent works of mine on mixtures of distributions, from the 1990 Gibbs sampler to the 2000 label switching and to later studies of Bayes factor approximations, nested sampling performances, improper priors, improved importance samplers, ABC, and a inverse perspective on the Bayesian approach to testing of hypotheses.

I am very grateful to the scientific committee for this invitation, as it will give me the opportunity to meet the new generation, learn from them and in addition discover Vienna where I have never been, despite several visits to Austria. Including its top, the Großglockner. I will also give a seminar in Linz the day before. In the Institut für Angewandte Statistik.

a day for comments

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 21, 2014 by xi'an

As I was flying over Skye (with [maybe] a first if hazy perspective on the Cuillin ridge!) to Iceland, three long sets of replies to some of my posts appeared on the ‘Og:

Thanks to them for taking the time to answer my musings…


MCMC for sampling from mixture models

Posted in Kids, Statistics, University life with tags , , on April 17, 2014 by xi'an

Randal Douc, Florian Maire, and Jimmy Olsson recently arXived a paper on the use of Markov chain Monte Carlo methods for the sampling of mixture models, which contains the recourse to Carlin and Chib (1995) pseudo-priors to simulate from a mixture distribution (and not from the posterior distribution associated with a mixture sampling model). As reported earlier, I was in the thesis defence of Florian Maire and this approach had already puzzled me at the time. In short, a mixture structure

\pi(z)\propto\sum_{m=1}^k \tilde\pi(m,z)

gives rises to as many auxiliary variables as there are components, minus one: namely, if a simulation z is generated from a given component i of the mixture, one can create pseudo-simulations u from all the other components, using pseudo-priors à la Carlin and Chib. A Gibbs sampler based on this augmented state-space can then be implemented:  (a) simulate a new component index m given (z,u);  (b) simulate a new value of (z,u) given m. One version (MCC) of the algorithm simulates z given m from the proper conditional posterior by a Metropolis step, while another one (FCC) only simulate the u‘s. The paper shows that MCC has a smaller asymptotic variance than FCC. I however fail to understand why a Carlin and Chib is necessary in a mixture context: it seems (from the introduction) that the motivation is that a regular Gibbs sampler [simulating z by a Metropolis-Hastings proposal then m] has difficulties moving between components when those components are well-separated. This is correct but slightly moot, as each component of the mixture can be simulated separately and in advance in z, which leads to a natural construction of (a) the pseudo-priors used in the paper, (b) approximations to the weights of the mixture, and (c) a global mixture independent proposal, which can be used in an independent Metropolis-Hastings mixture proposal that [seems to me to] alleviate(s) the need to simulate the component index m. Both examples used in the paper, a toy two-component two-dimensional Gaussian mixture and another toy two-component one-dimensional Gaussian mixture observed with noise (and in absolute value), do not help in perceiving the definitive need for this Carlin and Chib version. Especially when considering the construction of the pseudo-priors.

On the use of marginal posteriors in marginal likelihood estimation via importance-sampling

Posted in R, Statistics, University life with tags , , , , , , , , , , , , , on November 20, 2013 by xi'an

Perrakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the components. The only critical issue is to find good approximations to the marginal posterior densities. This is handled in the paper either by normal approximations or by Rao-Blackwell estimates. the latter being rather costly since one importance weight involves B.L computations, where B is the number of blocks and L the number of samples used in the Rao-Blackwell estimates. The time factor does not seem to be included in the comparison studies run by the authors, although it would seem necessary when comparing scenarii.

After a standard regression example (that did not include Chib’s solution in the comparison), the paper considers  2- and 3-component mixtures. The discussion centres around label switching (of course) and the deficiencies of Chib’s solution against the current method and Neal’s reference. The study does not include averaging Chib’s solution over permutations as in Berkoff et al. (2003) and Marin et al. (2005), an approach that does eliminate the bias. Especially for a small number of components. Instead, the authors stick to the log(k!) correction, despite it being known for being quite unreliable (depending on the amount of overlap between modes). The final example is Diggle et al. (1995) longitudinal Poisson regression with random effects on epileptic patients. The appeal of this model is the unavailability of the integrated likelihood which implies either estimating it by Rao-Blackwellisation or including the 58 latent variables in the analysis.  (There is no comparison with other methods.)

As a side note, among the many references provided by this paper, I did not find trace of Skilling’s nested sampling or of safe harmonic means (as exposed in our own survey on the topic).

open problem

Posted in R, Statistics with tags , , , , , on October 24, 2013 by xi'an

On the plane back from Warwick, I was reading an ABC arXived paper by Umberto Picchini and Julie Forman, “Accelerating inference for diffusions observed with measurement error and large sample sizes using Approximate Bayesian Computation: A case study” and came upon this open problem:

“A closed-form expression for generating percentiles from a fi nite-components Gaussian mixture is unavailable.” (p.5)

which means solving

\alpha\Phi(x)+(1-\alpha)\Phi(\{x-\mu)/\sigma) = \beta

is not possible in closed form. (Of course it could also be argued that the equation Φ(x)=β is unavailable in closed-form ie that the analytic solution x=Φ-1(β) is formal…) While I can think of several numerical approaches, a few minutes with a sheet of paper let me convinced that indeed this is not solvable (hence not an open problem, contrary to the title of the post!).

Just for R practice (and my R course students!), here is a basic R code:


# initial bounds
if (beta<alpha){
if (beta<onmal){

# iterations
for (t in 1:5){

 if ((omB-omb)<prec)

Dear Sir, I am unable to understand…

Posted in Statistics, University life with tags , , , , , , on January 30, 2013 by xi'an

Here is an email I received a few days ago, similar to many other emails I/we receive on a regular basis:

I am working on Markov Chain Monte Carlo methods as part of my Masters project. I have to estimate mean, variance from a Gaussian mixture using metropolis method.  I came across your paper ‘Bayesian Modelling and Inference on Mixtures of Distributions’. I am unable to understand how to obtain the new sample for mean, variance etc… I am using uniform distribution as proposal distribution. Should it be random numbers for the proposal distribution.
I have been working and trying to understand this for a long time. I would be grateful for any help.

While I felt sorry for the Master student, I consider it is the responsibility of his/her advisor to give her/him the proper directions for understanding the paper. (Given the contents of the email, it sounds as if the student would require proper training in both Bayesian statistics [uniform priors on unbounded parameters?] and simulation [the question about random numbers does not make sense]…) This is what I replied to the student, hopefully in a positive tone.

estimating the measure and hence the constant

Posted in pictures, Running, Statistics, University life with tags , , , , , , , on December 6, 2012 by xi'an

Dawn in Providence, Nov. 30, 2012As mentioned on my post about the final day of the ICERM workshop, Xiao-Li Meng addresses this issue of “estimating the constant” in his talk. It is even his central theme. Here are his (2011) slides as he sent them to me (with permission to post them!):

He therefore points out in slide #5 why the likelihood cannot be expressed in terms of the normalising constant because this is not a free parameter. Right! His explanation for the approximation of the unknown constant is then to replace the known but intractable dominating measure—in the sense that it cannot compute the integral—with a discrete (or non-parametric) measure supported by the sample. Because the measure is defined up to a constant, this leads to sample weights being proportional to the inverse density. Of course, this representation of the problem is open to criticism: why focus only on measures supported by the sample? The fact that it is the MLE is used as an argument in Xiao-Li’s talk, but this can alternatively be seen as a drawback: I remember reviewing Dankmar Böhning’s Computer-Assisted Analysis of Mixtures and being horrified when discovering this feature! I am currently more agnostic since this appears as an alternative version of empirical likelihood. There are still questions about the measure estimation principle: for instance, when handling several samples from several distributions, why should they all contribute to a single estimate of μ rather than to a product of measures? (Maybe because their models are all dominated by the same measure μ.) Now, getting back to my earlier remark, and as a possible answer to Larry’s quesiton, there could well be a Bayesian version of the above, avoiding the rough empirical likelihood via Gaussian or Drichlet process prior modelling.


Get every new post delivered to your Inbox.

Join 640 other followers