## relabelling in Bayesian mixtures by pivotal units

Posted in Statistics with tags , , , , on September 14, 2017 by xi'an

Yet another paper on relabelling for mixtures, when one would think everything and more has already be said and written on the topic… This one appeared in Statistics and Computing last August and I only became aware of it through ResearchGate which sent me an unsolicited email that this paper quoted one of my own papers. As well as Bayesian Essentials.

The current paper by Egidi, Pappadà, Pauli and Torelli starts from the remark that the similarity matrix of the probabilities for pairs of observations to be in the same component is invariant to label switching. A property we also used in our 2000 JASA paper. But here the authors assume it is possible to find pivots, that is, as many observations as there are components such that any pair of them is never in the same component with posterior probability one. These pivots are then used for the relabelling, as they define a preferential relabelling at each iteration. Now, this is not always possible since there are presumably iterations with empty components and there is rarely a zero probability that enough pairs never meet. The resolution of this quandary is then to remove the iterations for which this happens, a subsampling that changes the nature of the MCMC chain and may jeopardise its Markovian validation. The authors however suggest using alternative and computationally cheaper solutions to identify the pivots. (Which confuses me as to which solution they adopt.)

The next part of the paper compares this approach with seven other solutions found in the literature, from Matthew Stephens’ (2000) to our permutation reordering. Which does pretty well in terms of MSE in the simulation study (see the massive Table 3) while being much cheaper to implement than the proposed pivotal relabelling (Table 4). And which, contrary to the authors’ objection, does not require the precise computation of the MAP since, as indicated in our paper, the relative maximum based on the MCMC iterations can be used as a proxy. I am thus less than convinced at the improvement brought by this alternative…

## repulsive mixtures

Posted in Books, Statistics with tags , , , , , , , , on April 10, 2017 by xi'an

Fangzheng Xie and Yanxun Xu arXived today a paper on Bayesian repulsive modelling for mixtures. Not that Bayesian modelling is repulsive in any psychological sense, but rather that the components of the mixture are repulsive one against another. The device towards this repulsiveness is to add a penalty term to the original prior such that close means are penalised. (In the spirit of the sugar loaf with water drops represented on the cover of Bayesian Choice that we used in our pinball sampler, repulsiveness being there on the particles of a simulated sample and not on components.) Which means a prior assumption that close covariance matrices are of lesser importance. An interrogation I have has is was why empty components are not excluded as well, but this does not make too much sense in the Dirichlet process formulation of the current paper. And in the finite mixture version the Dirichlet prior on the weights has coefficients less than one.

The paper establishes consistency results for such repulsive priors, both for estimating the distribution itself and the number of components, K, under a collection of assumptions on the distribution, prior, and repulsiveness factors. While I have no mathematical issue with such results, I always wonder at their relevance for a given finite sample from a finite mixture in that they give an impression that the number of components is a perfectly estimable quantity, which it is not (in my opinion!) because of the fluid nature of mixture components and therefore the inevitable impact of prior modelling. (As Larry Wasserman would pound in, mixtures like tequila are evil and should likewise be avoided!)

The implementation of this modelling goes through a “block-collapsed” Gibbs sampler that exploits the latent variable representation (as in our early mixture paper with Jean Diebolt). Which includes the Old Faithful data as an illustration (for which a submission of ours was recently rejected for using too old datasets). And use the logarithm of the conditional predictive ordinate as  an assessment tool, which is a posterior predictive estimated by MCMC, using the data a second time for the fit.

## SMC on a sequence of increasing dimension targets

Posted in Statistics with tags , , , , , , , , , on February 15, 2017 by xi'an

Richard Everitt and co-authors have arXived a preliminary version of a paper entitled Sequential Bayesian inference for mixture models and the coalescent using sequential Monte Carlo samplers with transformations. The central notion is an SMC version of the Carlin & Chib (1995) completion in the comparison of models in different dimensions. Namely to create auxiliary variables for each model in such a way that the dimension of the completed models are all the same. (Reversible jump MCMC à la Peter Green (1995) can also be interpreted this way, even though only relevant bits of the completion are used in the transitions.) I find the paper and the topic most interesting if only because it relates to earlier papers of us on population Monte Carlo. It also brought to my awareness the paper by Karagiannis and Andrieu (2013) on annealed reversible jump MCMC that I had missed at the time it appeared. The current paper exploits this annealed expansion in the devising of the moves. (Sequential Monte Carlo on a sequence of models with increasing dimension has been studied in the past.)

The way the SMC is described in the paper, namely, reweight-subsample-move, does not strike me as the most efficient as I would try to instead move-reweight-subsample, using a relevant move that incorporate the new model and hence enhance the chances of not rejecting.

One central application of the paper is mixture models with an unknown number of components. The SMC approach applied to this problem means creating a new component at each iteration t and moving the existing particles after adding the parameters of the new component. Since using the prior for this new part is unlikely to be at all efficient, a split move as in Richardson and Green (1997) can be considered, which brings back the dreaded Jacobian of RJMCMC into the picture! Here comes an interesting caveat of the method, namely that the split move forces a choice of the split component of the mixture. However, this does not appear as a strong difficulty, solved in the paper by auxiliary [index] variables, but possibly better solved by a mixture representation of the proposal, as in our PMC [population Monte Carlo] papers. Which also develop a family of SMC algorithms, incidentally. We found there that using a mixture representation of the proposal achieves a provable variance reduction.

“This puts a requirement on TSMC that the single transition it makes must be successful.”

As pointed by the authors, the transformation SMC they develop faces the drawback that a given model is only explored once in the algorithm, when moving to the next model. On principle, there would be nothing wrong in including regret steps, retracing earlier models in the light of the current one, since each step is an importance sampling step valid on its own right. But SMC also offers a natural albeit potentially high-varianced approximation to the marginal likelihood, which is quite appealing when comparing with an MCMC outcome. However, it would have been nice to see a comparison with alternative estimates of the marginal in the case of mixtures of distributions. I also wonder at the comparative performances of a dual approach that would be sequential in the number of observations as well, as in Chopin (2004) or our first population Monte Carlo paper (Cappé et al., 2005), since subsamples lead to tempered versions of the target and hence facilitate moves between models, being associated with flatter likelihoods.

## zurück in Wien

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , on December 7, 2015 by xi'an

Back in Vienna after a little bit more than a year! The opportunity was a working meeting on a CRC Handbook of mixture analysis, Sylvia Früwirth-Schnatter, Gilles Celeux and myself are editing together, along with about twenty authors, half of which also came to Vienna for the weekend. Great opportunity to all work together, towards a more coherent and comprehensive volume, as well as to enjoy the earliest stages of the Viennese winter. Very mild winter so far. I also gave a seminar Friday morning, thinking until I saw the attached poster that I was going to speak on mixtures for testing..! Except for a few seconds of uncertainty on the second version of the random forest approach, I still managed to survive the switch (in a fabulous seminar room, overlooking the Prater…) The two days meeting was very rewarding, with major changes in the contents and the goals of many chapters, including those I am contributing to.

## off to New York

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on March 29, 2015 by xi'an

I am off to New York City for two days, giving a seminar at Columbia tomorrow and visiting Andrew Gelman there. My talk will be about testing as mixture estimation, with slides similar to the Nice ones below if slightly upgraded and augmented during the flight to JFK. Looking at the past seminar speakers, I noticed we were three speakers from Paris in the last fortnight, with Ismael Castillo and Paul Doukhan (in the Applied Probability seminar) preceding me. Is there a significant bias there?!

## a Nice talk

Posted in Books, Statistics, Travel, University life with tags , , , , , , , on February 20, 2015 by xi'an

Today, I give a talk on our testing paper in Nice, in a workshop run in connection with our Calibration ANR grant:

The slides are directly extracted from the paper but it still took me quite a while to translate the paper into those, during the early hours of our Czech break this week.

One added perk of travelling to Nice is the flight there, as it parallels the entire French Alps, a terrific view in nice weather!

## relabelling mixtures (#2)

Posted in Statistics, Travel, University life with tags , , , , , , on February 5, 2015 by xi'an

Following the previous post, I went and had  a (long) look at Puolamäki and Kaski’s paper. I must acknowledge that, despite having several runs through the paper, I still have trouble with the approach… From what I understand, the authors use a Bernoulli mixture pseudo-model to reallocate the observations to components.  That is, given an MCMC output with simulated allocations variables (a.k.a., hidden or latent variables), they create a (TxK)xn matrix of component binary indicators e.g., for a three component mixture,

0 1 0 0 1 0…
1 0 0 0 0 0…
0 0 1 1 0 1…
0 1 0 0 1 1…

and estimate a probability to be in component j for each of the n observations, according to the (pseudo-)likelihood

$\prod_{r=1}^R \sum_{j=1}^K \prod_{i=1}^n \beta_{i,j}^{z_{i,r}}(1-\beta_{i,j})^{1-z_{i,r}}$

It took me a few days, between morning runs and those wee hours when I cannot get back to sleep (!), to make some sense of this Bernoulli modelling. The allocation vectors are used together to estimate the probabilities of being “in” component j together. However the data—which is the outcome of an MCMC simulation and de facto does not originate from that Bernoulli mixture—does not seem appropriate, both because it is produced by an MCMC simulation and is made of blocks of highly correlated rows [which sum up to one]. The Bernoulli likelihood above also defines a new model, with many more parameters than in the original mixture model. And I fail to see why perfect, partial or inexistent label switching [in the MCMC sequence] is not going to impact the estimation of the Bernoulli mixture. And why an argument based on a fixed parameter value (Theorem 3) extends to an MCMC outcome where parameters themselves are subjected to some degree of label switching. Bemused, I remain…