Label switching is an issue with mixture estimation (and other latent variable models) because mixture models are ill-posed models where part of the parameter is not identifiable. Indeed, the density of a mixture being a sum of terms

the parameter (vector) of the ω’s and of the θ’s is at best identifiable up to an arbitrary permutation of the components of the above sum. In other words, “component #1 of the mixture” is not a meaningful concept. And hence cannot be estimated.

This problem has been known for quite a while, much prior to EM and MCMC algorithms for mixtures, but it is only since mixtures have become truly estimable by Bayesian approaches that the debate has grown on this issue. In the very early days, Jean Diebolt and I proposed ordering the components in a unique way to give them a meaning. For instant, “component #1″ would then be the component with the smallest mean or the smallest weight and so on… Later, in one of my favourite X papers, with Gilles Celeux and Merrilee Hurn, we exposed the convergence issues related with the non-identifiability of mixture models, namely that the posterior distributions were almost always multimodal, with a multiple of k! symmetric modes in the case of exchangeable priors, and therefore that Markov chains would have trouble to visit all those modes in a symmetric manner, despite the symmetry being guaranteed from the shape of the posterior. And we conclude with the slightly provocative statement that hardly any Markov chain inferring about mixture models had ever converged! In parallel, time-wise, Matthew Stephens had completed a thesis at Oxford on the same topic and proposed solutions for relabelling MCMC simulations in order to identify a single mode and hence produce meaningful estimators. Giving another meaning to the notion of “component #1″.

And then the topic began to attract more and more researchers, being both simple to describe and frustrating in its lack of definitive answer, both from simulation and inference perspectives. Rodriguez’s and Walker’s paper provides a survey on the label switching strategies in the Bayesian processing of mixtures, but its innovative part is in deriving a relabelling strategy. Which consists of finding the optimal permutation (at each iteration of the Markov chain) by minimising a loss function inspired from k-means clustering. Which is connected with both Stephens’ and our [JASA, 2000] loss functions. The performances of this new version are shown to be roughly comparable with those of other relabelling strategies, in the case of Gaussian mixtures. (Making me wonder if the choice of the loss function is not favourable to Gaussian mixtures.) And somehow faster than Stephens’ Kullback-Leibler loss approach.

“Hence, in an MCMC algorithm, the indices of the parameters can permute multiple times between iterations. As a result, we cannot identify the hidden groups that make [all] ergodic averages to estimate characteristics of the components useless.”

One section of the paper puzzles me, albeit it does not impact the methodology and the conclusions. In Section 2.1 (p.27), the authors consider the quantity

which is the marginal probability of allocating observation i to cluster or component j. Under an exchangeable prior, this quantity is uniformly equal to 1/k for all observations i and all components j, by virtue of the invariance under permutation of the indices… So at best this can serve as a control variate. Later in Section 2.2 (p.28), the above sentence does signal a problem with those averages but it seem to attribute it to MCMC behaviour rather than to the invariance of the posterior (or to the non-identifiability of the components per se). At last, the paper mentions that “given the allocations, the likelihood is invariant under permutations of the parameters and the allocations” (p.28), which is not correct, since eqn. (8)

does not hold when the two permutations σ and τ give different images of *z _{i}*…

Filed under: Books, Statistics, University life Tagged: component of a mixture, convergence, finite mixtures, identifiability, ill-posed problem, invariance, label switching, loss function, MCMC algorithms, missing data, multimodality, relabelling ]]>

*[As a trivial aside, I had first used *fresh from the press(es)* as the bracketted comment, before I realised the meaning was not necessarily the same in English and in French.]*

Filed under: Books, Statistics, University life Tagged: ABC model choice, Approximate Bayesian computation, JRSSB, Royal Statistical Society, Series B, statistical methodology, summary statistics ]]>

Bonsoir Professeur

Je « suis » votre cours du mercredi dont le formalisme mathématique me fait froid partout

Avec beaucoup de difficulté je vous envoie mes exercices du premier chapitre de votre livre.

which translates as

Good evening Professor,

I “follow” your Wednesday class which mathematical formalism makes me cold all over. With much hardship, I send you the first batch of problems from your book.

I know that winter is coming, but, still, making students shudder from mathematical cold is not my primary goal when teaching Monte Carlo methods!

Filed under: Books, Kids, Statistics, University life Tagged: computational statistics, ENSAE, Master program, MCMC algorithms, Monte Carlo Statistical Methods, statistical computing, Université Paris Dauphine, Winter is coming ]]>

When compared with our recent update of the arXived paper, there is not much different in contents, as it is mostly an issue of fitting the PNAS publication canons. (Which makes the paper less readable in the posted version [in my opinion!] as it needs to fit the main document within the compulsory six pages, relegated part of the experiments and of the explanations to the Supplementary Information section.)

Filed under: pictures, R, Statistics, University life Tagged: 1000 Genomes Project, ABC, ABC model choice, machine learning, model posterior probabilities, posterior predictive, random forests, summary statistics ]]>

“Specifically, the models with subsets of covariates are found by maximizing the similarity of their predictions to this reference as proposed by Dupuis and Robert [12]. Notably, this approach does not require specifying priors for the submodels and one can instead focus on building a good reference model. Dupuis and Robert (2003) suggest choosing the size of the covariate subset based on an acceptable loss of explanatory power compared to the reference model. We examine using cross-validation based estimates of predictive performance as an alternative.” T. Peltola et al.

The paper also connects with the Bayesian Lasso literature, concluding on the horseshoe prior being more informative than the Laplace prior. It applies the selection approach to identify biomarkers with predictive performances in a study of diabetic patients. The authors rank model according to their (log) predictive density at the observed data, using cross-validation to avoid exploiting the data twice. On the MCMC front, the paper implements the NUTS version of HMC with STAN.

Filed under: Mountains, pictures, Statistics, Travel, University life Tagged: Aki Vehtari, Bayesian lasso, Dan Simpson, embedded models, Hamiltonian Monte Carlo, horseshoe prior, Kullback-Leibler divergence, MCMC, Norway, NUTS, predictive power, prior projection, STAN, variable selection, zero measure set ]]>

Filed under: Statistics, University life Tagged: data cloning, document analysis, map, Monte Carlo Statistical Methods, parallel MCMC, SAME, simulated annealing, simulation, stochastic optimisation, variational Bayes methods ]]>

There is a new blog I wanted to tell you about which excerpts one interesting or classic paper or book a day from the mathematical sciences. We plan on daily posting across the range of mathematical fields and at any level, but about 20-30% of the posts in queue are from statistics. The goal is to entice people to read the great works of old. The first post today was from an old paper by Fisher applying Group Theory to the design of experiments.

Interesting concept, which will hopefully generate comments to put the quoted passage into context. Somewhat connected to my Reading Statistical Classics posts. Which ~~incidentally if sadly will not take place this year since only two students registered.~~ should take place in the end since more students registered! (I am unsure about the references behind the title of that blog, besides Spielberg’s Raiders of the Lost Ark and Norman’s Marauders of Gor… I just hope Statistics does not qualify as a lost science!)

Filed under: Books, Statistics, University life Tagged: blogging, classics, graduate course, marauders of Gor, R.A. Fisher, Raiders of the Lost Ark, reading list ]]>

“I though you were a sceptic, said Lesley. I though you were scientific”

The first volume in this detective+magic series, *Rivers of London*, sets the universe of this mix of traditional Metropolitan Police work and of urban magic, the title being about the deities of the rivers of London, including a Mother and a Father Thames… I usually dislike any story mixing modern life and fantasy but this is a definitive exception! What I enjoy in this book setting is primarily the language used in the book that is so uniquely English (to the point of having the U.S. edition edited!, if the author’s blog is to be believed). And the fact that it is so much about London, its history and inhabitants. But mostly about London, as an entity on its own. Even though my experience of London is limited to a few boroughs, there are many passages where I can relate to the location and this obviously makes the story much more appealing. The style is witty, ironic and full of understatements, a true pleasure.

“The tube is a good place for this sort of conceptual breakthrough because, unless you’ve got something to read, there’s bugger all else to do.”

The story itself is rather fun, with at least three levels of plots and two types of magic. It centres around two freshly hired London constables, one of them discovering magical abilities and been drafted to the supernatural section of the Metropolitan Police. And making all the monologues in the book. The supernatural section is made of a single Inspector, plus a few side characters, but with enough fancy details to give it life. In particular, Isaac Newton is credited with having started the section, called The Folly. Which is also the name of Ben Aaronovitch’s webpage.

“There was a poster (…) that said: `Keep Calm and Carry On’, which I thought was good advice.”

This quote is unvoluntarily funny in that it takes place in a cellar holding material from World War II. Except that the now invasive red and white poster was never distributed during the war… On the opposite it was pulped to save paper and the fact that a few copies survived is a sort of (minor) miracle. Hence a double anachronism in that it did not belong to a WWII room and that Peter Grant should have seen its modern avatars all over London.

“Have you ever been to London? Don’t worry, it’s basically just like the country. Only with more people.”

The last part of the book is darker and feels less well-written, maybe simply because of the darker side and of the accumulation of events, while the central character gets rather too central and too much of an unexpected hero that saves the day. There is in particular a part where he seems to forget about his friend Lesley who is in deep trouble at the time and this does not seem to make much sense. But, except for this lapse (maybe due to my quick reading of the book over the week in Warwick), the flow and pace are great, with this constant undertone of satire and wit from the central character. I am definitely looking forward reading tomes 2 and 3 in the series (having already read tome 4 in Austria!, which was a mistake as there were spoilers about earlier volumes).

Filed under: Books, Kids, Travel Tagged: Ben Aaronnovitch, book review, cockney slang, ghosts, Isaac Newton, Keep calm posters, London, magics, Metropolitan Police, Peter Grant series, Thames, Warwick ]]>

Rasmus set a prior on the total number m of socks, a negative Binomial Neg(15,1/3) distribution, and another prior of the proportion of socks that come by pairs, a Beta B(15,2) distribution, then simulated pseudo-data by picking eleven socks at random, and at last applied ABC (in Rubin’s 1984 sense) by waiting for the observed event, i.e. only orphans and no pair [of socks]. Brilliant!

The overall simplicity of the problem set me wondering about an alternative solution using the likelihood. Cannot be that hard, can it?! After a few computations rejected by opposing them to experimental frequencies, I put the problem on hold until I was back home and with access to my Feller volume 1, one of the few [math] books I keep at home… As I was convinced one of the exercises in Chapter II would cover this case. After checking, I found a partial solution, namely Exercice 26:

A closet contains n pairs of shoes. If 2r shoes are chosen at random (with 2r<n), what is the probability that there will be (a) no complete pair, (b) exactly one complete pair, (c) exactly two complete pairs among them?

This is not exactly a solution, but rather a problem, however it leads to the value

as the probability of obtaining j pairs among those 2r shoes. Which also works for an odd number t of shoes:

as I checked against my large simulations. So I solved Exercise 26 in Feller volume 1 (!), but not Rasmus’ problem, since there are those orphan socks on top of the pairs. If one draws 11 socks out of m socks made of f orphans and g pairs, with f+2g=m, the number k of socks from the orphan group is an hypergeometric H(11,m,f) rv and the probability to observe 11 orphan socks total (either from the orphan or from the paired groups) is thus the marginal over all possible values of k:

so it could be argued that we are facing a closed-form likelihood problem. Even though it presumably took me longer to achieve this formula than for Rasmus to run his exact ABC code!

Filed under: Books, Kids, R, Statistics, University life Tagged: ABC, capture-recapture, combinatorics, subjective prior, William Feller ]]>

ln ~/mother.bib

but they eventually and inexplicably end up with a life of their own!) So I decided a Spring clean-up was in order and installed BibTool on my Linux machine to gather all those versions into a new encompassing all-inclusive bib reference. I did not take advantage of the many possibilities of the program, written by Gerd Neugebauer, but it certainly solved my problem: once I realised I had to set the variates

check.double = on check.double.delete = on pass.comments = off

all I had to do was to call

bibtool -s -i ../*/*.bib -o mother.bib bibtool -d -i mother.bib -o mother.bib bibtool -s -i mother.bib -o mother.bib

to merge all bib file and then to get rid of the duplicated entries in mother.bib (the -d option commented out the duplicates and the second call with -s removed them). And to remove the duplicated definitions in the preamble of the file. This took me very little time in the RER train from Paris-Dauphine (where I taught this morning, having a hard time to make the students envision the empirical cdf as an average of Dirac masses!) to Roissy airport, in contrast with my pedestrian replacement of all stray siblings of the mother bib into new proper hard links, one by one. I am sure there is a bash command that could have done it in one line, but I spent instead my flight to Birmingham switching all existing bib files, one by one…

Filed under: Books, Linux, Travel, University life Tagged: bash, BibTeX, BibTool, Birmingham, Charles de Gaulle, LaTeX, link, Linux, RER B, Roissy, University of Warwick ]]>