MCMC for sampling from mixture models
Randal Douc, Florian Maire, and Jimmy Olsson recently arXived a paper on the use of Markov chain Monte Carlo methods for the sampling of mixture models, which contains the recourse to Carlin and Chib (1995) pseudo-priors to simulate from a mixture distribution (and not from the posterior distribution associated with a mixture sampling model). As reported earlier, I was in the thesis defence of Florian Maire and this approach had already puzzled me at the time. In short, a mixture structure
gives rises to as many auxiliary variables as there are components, minus one: namely, if a simulation z is generated from a given component i of the mixture, one can create pseudo-simulations u from all the other components, using pseudo-priors à la Carlin and Chib. A Gibbs sampler based on this augmented state-space can then be implemented: (a) simulate a new component index m given (z,u); (b) simulate a new value of (z,u) given m. One version (MCC) of the algorithm simulates z given m from the proper conditional posterior by a Metropolis step, while another one (FCC) only simulate the u‘s. The paper shows that MCC has a smaller asymptotic variance than FCC. I however fail to understand why a Carlin and Chib is necessary in a mixture context: it seems (from the introduction) that the motivation is that a regular Gibbs sampler [simulating z by a Metropolis-Hastings proposal then m] has difficulties moving between components when those components are well-separated. This is correct but slightly moot, as each component of the mixture can be simulated separately and in advance in z, which leads to a natural construction of (a) the pseudo-priors used in the paper, (b) approximations to the weights of the mixture, and (c) a global mixture independent proposal, which can be used in an independent Metropolis-Hastings mixture proposal that [seems to me to] alleviate(s) the need to simulate the component index m. Both examples used in the paper, a toy two-component two-dimensional Gaussian mixture and another toy two-component one-dimensional Gaussian mixture observed with noise (and in absolute value), do not help in perceiving the definitive need for this Carlin and Chib version. Especially when considering the construction of the pseudo-priors.
April 18, 2014 at 9:50 am
Thanks Christian for considering our work in your successful blog! Although at a first glance, the Carlin and Chib approach might seem to be quite a heavy algorithm to deal with multimodal targets, it is actually rather efficient with only a few pseudo-priors as noted in the examples. Moreover, I am not sure that the idea that you suggested, i.e. to use a mixture proposal in an MH framework, could be a good challenger since you then would fix, once and for all, the coefficients of that mixture without any updating. To get better proposals one might of course be tempted to propose according to a mixture of these pseudo-priors with classical normalised importance sampling weights; however, the resulting density is obtained through marginalisation at the selected point, and in general the intractability of this quantity implies that the MH acceptance probability can not be even calculated. I understand that the use of pseudo priors might seem quite wasteful since we finally keep only one of the corresponding draws, but this drawback has to be balanced with the nice feature that it allows more jumps between modes. It may be also quite striking to note that also Particle Gibbs proposes N-1 trajectories and then, after putting them in a set already containing the current trajectory, selects only one of these trajectories according to some Gibbs update on an extended target. In some sense, it shares many similarities with the CC algorithm since in the latter case, N-1 pseudo priors are generated and then put in a set with the current point. The selection of the new point is then obtained according a Gibbs update on an extended target just as for the Particle Gibbs!