## AMOR at 5000ft in a water tank…

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , , , , , , on November 22, 2012 by xi'an

On Monday, I attended the thesis defence of Rémi Bardenet in Orsay as a member (referee) of his thesis committee. While this was a thesis in computer science, which took place in the Linear Accelerator Lab in Orsay, it was clearly rooted in computational statistics, hence justifying my presence in the committee. The justification (!) for the splashy headline of this post is that Rémi’s work was motivated by the Pierre-Auger experiment on ultra-high-energy cosmic rays, where particles are detected through a network of 1600 water tanks spread over the Argentinian Pampa Amarilla on an area the size of Rhode Island (where I am incidentally going next week).

The part of Rémi’s thesis presented during the defence concentrated on his AMOR algorithm, arXived in a paper written with Olivier Cappé and Gersende Fort. AMOR stands for adaptive Metropolis online relabelling and combines adaptive MCMC techniques with relabelling strategies to fight label-switching (e.g., in mixtures). I have been interested in mixtures for eons (starting in 1987 in Ottawa with applying Titterington, Smith, and Makov to chest radiographs) and in label switching for ages (starting at the COMPSTAT conférence in Bristol in 1998). Rémi’s approach to the label switching problem follows the relabelling path, namely a projection of the original parameter space into a smaller subspace (that is also a quotient space) to avoid permutation invariance and lack of identifiability. (In the survey I wrote with Kate Lee, Jean-Michel Marin and Kerrie Mengersen, we suggest using the mode as a pivot to determine which permutation to use on the components of the mixture.) The paper suggests using an Euclidean distance to a mean determined adaptively, μt, with a quadratic form Σt also determined on-the-go, minimising (Pθ-μt)TΣt(Pθ-μt) over all permutations P at each step of the algorithm. The intuition behind the method is that the posterior over the restricted space should look like a roughly elliptically symmetric distribution, or at least like a unimodal distribution, rather than borrowing bits and pieces from different modes. While I appreciate the technical tour de force represented by the proof of convergence of the AMOR algorithm, I remain somehow sceptical about the approach and voiced the following objections during the defence: first, the assumption that the posterior becomes unimodal under an appropriate restriction is not necessarily realistic. Secondary modes often pop in with real data (as in the counter-example we used in our paper with Alessandra Iacobucci and Jean-Michel Marin). Next, the whole apparatus of fighting multiple modes and non-identifiability, i.e. fighting label switching, is to fall back on posterior means as Bayes estimators. As stressed in our JASA paper with Gilles Celeux and Merrilee Hurn, there is no reason for doing so and there are several reasons for not doing so:

• it breaks down under model specification, i.e., when the number of components is not correct
• it does not improve the speed of convergence but, on the opposite, restricts the space visited by the Markov chain
• it may fall victim to the fatal attraction of secondary modes by fitting too small an ellipse around one of those modes
• it ultimately depends on the parameterisation of the model
• there is no reason for using posterior means in mixture problems, posterior modes or cluster centres can be used instead

I am therefore very much more in favour of producing a posterior distribution that is as label switching as possible (since the true posterior is completely symmetric in this respect). Post-processing the resulting sample can be done by using off-the-shelf clustering in the component space, derived from the point process representation used by Matthew Stephens in his thesis and subsequent papers. It also allows for a direct estimation of the number of components.

In any case, this was a defence worth-attending that led me to think afresh about the label switching problem, with directions worth exploring next month while Kate Lee is visiting from Auckland. Rémi Bardenet is now headed for a postdoc in Oxford, a perfect location to discuss further label switching and to engage into new computational statistics research!

## Number of components in a mixture

Posted in Books, R, Statistics, University life with tags , , , , , on August 6, 2011 by xi'an

I got a paper (unavailable online) to referee about testing for the order (i.e. the number of components) of a normal mixture. Although this is an easily spelled problem, namely estimate k in

$\sum_{i=1}^k p_{ik} \mathcal{N}(\mu_{ik},\sigma^2_{ik}),$

I came to the conclusion that it is a kind of ill-posed problem. Without a clear definition of what a component is, i.e. without a well-articulated prior distribution, I remain unconvinced that k can be at all estimated. Indeed, how can we distinguish between a k component mixture and a (k+1) component mixture with an extremely small (in the sense of the component weight) additional component? Solutions ending up with a convenient chi-square test thus sound unrealistic to me… I am not implying the maths are wrong in any way, simply that the meaning of the test and the nature of the null hypothesis are unclear from a practical and methodological perspective. In the case of normal (but also Laplace) mixtures, the difficulty is compounded by the fact that the likelihood function is unbounded, thus wide open to over-fitting (at least in a non-Bayesian setting). Since Ghosh and Sen (1985), authors have come up with various penalisation functions, but I remain openly atheistic about the approach! (I do not know whether or not this is related with the summer season, but I have received an unusual number of papers to referee lately, e.g., handling three papers last Friday, one on Saturday, and yet another one on Monday morning. Interestingly, about half of them are from  non-statistical journals!)

## Improving convergence of Data Augmentation algorithms

Posted in Mountains, Statistics, University life with tags , , , , on June 7, 2011 by xi'an

Following an earlier submission to Statistical Science, we have now resubmitted and arXived the new version of our paper “Improving the convergence properties of the Data Augmentation algorithm with an application to Bayesian mixture modelling”, written with Jim Hobert (University of Florida), and Vivek Roy (Iowa State University). Given that both referees were quite positive about the earlier version, the changes are truly minor and overwhelmingly stylistic. Again, I am I am very glad to be part of this paper because of the results but also because it relates to a problem I discussed at length with Richard Tweedie when I visited him in Colorado in 1993… (The above picture of Richard, along with Gareth Roberts and Anto Mira, was taken during the TMR Workshop on Computational and Spatial Statistics I organised in Aussois in 1998, with unfortunately no remaining webpage! A pre-MCMC’ski of sorts, when I had not yet started skiing!)

## Mixture book cover proposal

Posted in Books, Mountains, pictures, Statistics with tags , , , , on January 24, 2011 by xi'an

Here is a proposal for the cover of the mixture book on Mixture Estimation and Applications we are editing, with a stylised rendering of Buachaille Etive Mor… I am not sure it sufficiently fits the topic of the book, other than being taken a few hours and 107 miles from the start of the meeting at ICMS (!), so the cover may end up with a more scientific picture. However, this is my favourite sight when approaching Glencoe, Buachaille rising by itself from Etive Mor, especially with abundant snow and crisp cold sunny skies as I got last March…. Other than those aesthetic considerations, the book is very close to production. I spent Tuesday working on the index and the proofs should be sent to the authors pretty soon.

## Statistical Inference

Posted in Books, Statistics, University life with tags , , , , , , , , , on November 16, 2010 by xi'an

Following the publication of several papers on the topic of integrated evidence (about competing models), Murray Aitkin has now published a book entitled Statistical Inference and I have now finished reading it. While I appreciate the effort made by Murray Aitkin to place his theory within a coherent Bayesian framework, I remain unconvinced of the said coherence, for reasons exposed below.

The main chapters of the book are Chapter 2 about the “Integrated Bayes/likelihood approach” and Chapter 4 about the “Unified analysis of finite populations”, Chapter 7 also containing a new proposal about “Goodness of fit and model diagnostics”. Chapter 1 is a nice introduction to frequentist, likelihood and Bayesian approaches to inference and the four remaining chapters are applications of Murray Aitkin‘s principles to various models.  The style of the book is quite pleasant although slightly discursive in what I (a Frenchman!) would qualify as an English style in that it is often relying on intuition to develop concepts. I also think that the argument of being close to the frequentist decision (aka the p-value) too often serves as a justification in the book (see, e.g., page 43 “the p-value has a direct interpretation as a posterior probability”). As an aside, Murray Aitkin is a strong believer in plotting cdfs rather than densities to provide information about a distribution and hence cdf plots abound throughout the book.  (I counted 82 pictures of them.) While the book contains a helpful array of examples and datasets, the captions of the (many) figures are too terse for my taste: The figures are certainly not self-contained and even with the help of the main text they do not always make complete sense. Read more »