Particle Gibbs for conjugate mixture posteriors
Alexandre Bouchard-Coté, Arnaud Doucet, and Andrew Roth have arXived a paper “Particle Gibbs Split-Merge Sampling for Bayesian Inference in Mixture Models” that proposes an efficient algorithm to explore the posterior distribution of a mixture, when interpreted as a clustering model. (To clarify the previous sentence, this is a regular plain vanilla mixture model for which they explore the latent variable representation.)
I like very much the paper because it relates to an earlier paper of mine with George Casella and Marty Wells, paper we wrote right after a memorable JSM in Baltimore (during what may have been my last visit to Cornell University as George left for Florida the following year). The starting point of this approach to mixture estimation is that the (true) parameters of a mixture can be (exactly) integrated out when using conjugate priors and a completion step. Namely, the marginal posterior distribution of the latent variables given the data is available in closed form. The latent variables being the component allocations for the observations. The joint posterior is then a product of the prior on the parameters times the prior on the latents times a product of simple (e.g., Gaussian) terms. This equivalently means the marginal likelihoods conditional on the allocations are available in closed form. Looking directly at those marginal likelihoods, a prior distribution on the allocations can be introduced (e.g., the Pitman-Yor process or the finite Dirichlet prior) and, together, they define a closed form target. Albeit complex. As often on a finite state space. In our paper with George and Marty, we proposed using importance sampling to explore the set, using for instance marginal distributions for the allocations, which are uniform in the case of exchangeable priors, but this is not very efficient, as exhibited by our experiments where very few partitions would get most of the weight.
Even a Gibbs sampler on subsets of those indicators restricted to two components cannot be managed directly. The paper thus examines a specially designed particle Gibbs solution that implements a split and merge move on two clusters at a time. Merging or splitting the subset. With intermediate target distributions, SMC style. While this is quite an involved mechanism, that could be deemed as excessive for the problem at hand, as well as inducing extra computing time, experiments in the paper demonstrate the mostly big improvement in efficiency brought by this algorithm.