**H**ere are the slides of my talk today at the BAYSM’14 conference in Vienna. Mostly an overview of some of my papers on mixtures, with the most recent stuff…

## Archive for mixtures

## my life as a mixture [BAYSM 2014, Wien]

Posted in Books, Kids, Mountains, pictures, Statistics, Travel, University life with tags Austria, Bayes factor, Bayesian tests of hypotheses, BAYSM, Gibbs sampling, importance sampling, label switching, Linz, mixtures, Vienna, WU Wirtschaftsuniversität Wien on September 12, 2014 by xi'an**N**ext week I am giving a talk at BAYSM in Vienna. BAYSM is the Bayesian *Young* Statisticians meeting so one may wonder why, but with Chris Holmes and Mike West, we got invited as more… erm… senior speakers! So I decided to give a definitely *senior* talk on a thread pursued throughout my career so far, namely mixtures. Plus it also relates to works of the other senior speakers. Here is the abstract for the talk:

Mixtures of distributions are fascinating objects for statisticians in that they both constitute a straightforward extension of standard distributions and offer a complex benchmark for evaluating statistical procedures, with a likelihood both computable in a linear time and enjoying an exponential number of local models (and sometimes infinite modes). This fruitful playground appeals in particular to Bayesians as it constitutes an easily understood challenge to the use of improper priors and of objective Bayes solutions. This talk will review some ancient and some more recent works of mine on mixtures of distributions, from the 1990 Gibbs sampler to the 2000 label switching and to later studies of Bayes factor approximations, nested sampling performances, improper priors, improved importance samplers, ABC, and a inverse perspective on the Bayesian approach to testing of hypotheses.

**I** am very grateful to the scientific committee for this invitation, as it will give me the opportunity to meet the new generation, learn from them and in addition discover Vienna where I have never been, despite several visits to Austria. Including its top, the Gro*ß*glockner. I will also give a seminar in Linz the day before. In the Institut für Angewandte Statistik.

## a day for comments

Posted in Mountains, Statistics, Travel, University life with tags AISTATS 2014, Bayesian variable selection, Brad Carlin, Cuillin ridge, Gaussian mixture, Gibbs sampler, hierarchical models, Iceland, ICML, Langevin MCMC algorithm, MCMC, Metropolis-Hastings algorithms, mixtures, model complexity, penalisation, reference priors, Reykjavik, RJMCMC, Russian doll, Scotland, sequential Monte Carlo, Sid Chib, Skye, speedup, spike-and-slab prior, variable dimension models on April 21, 2014 by xi'an**A**s I was flying over Skye (with [maybe] a first if hazy perspective on the Cuillin ridge!) to Iceland, three long sets of replies to some of my posts appeared on the ‘Og:

- Dan Simpson replied to my comments of last Tuesday about his PC construction;
- Arnaud Doucet precised some issues about his adaptive subsampling paper;
- Amandine Schreck clarified why I had missed some points in her Bayesian variable selection paper;
- Randal Douc defended the efficiency of using Carlin and Chib (1995) method for mixture simulation.

Thanks to them for taking the time to answer my musings…

## MCMC for sampling from mixture models

Posted in Kids, Statistics, University life with tags Carlin, Gaussian mixture, mixtures on April 17, 2014 by xi'an**R**andal Douc, Florian Maire, and Jimmy Olsson recently arXived a paper on the use of Markov chain Monte Carlo methods for the sampling of mixture models, which contains the recourse to Carlin and Chib (1995) pseudo-priors to simulate from a mixture distribution* (and not from the posterior distribution associated with a mixture sampling model)*. As reported earlier, I was in the thesis defence of Florian Maire and this approach had already puzzled me at the time. In short, a mixture structure

gives rises to as many auxiliary variables as there are components, minus one: namely, if a simulation * z* is generated from a given component

**of the mixture, one can create pseudo-simulations**

*i***from all the other components, using pseudo-priors à la Carlin and Chib. A Gibbs sampler based on this augmented state-space can then be implemented: (a) simulate a new component index**

*u**given (*

**m****); (b) simulate a new value of (**

*z,u***) given**

*z,u***. One version (MCC) of the algorithm simulates**

*m***given m from the proper conditional posterior by a Metropolis step, while another one (FCC) only simulate the**

*z***‘s. The paper shows that MCC has a smaller asymptotic variance than FCC. I however fail to understand why a Carlin and Chib is necessary in a mixture context: it seems (from the introduction) that the motivation is that a regular Gibbs sampler [simulating**

*u***by a Metropolis-Hastings proposal then**

*z***] has difficulties moving between components when those components are well-separated. This is correct but slightly moot, as each component of the mixture can be simulated separately and in advance in**

*m***, which leads to a natural construction of (a) the pseudo-priors used in the paper, (b) approximations to the weights of the mixture, and (c) a global mixture independent proposal, which can be used in an independent Metropolis-Hastings mixture proposal that [seems to me to] alleviate(s) the need to simulate the component index**

*z***. Both examples used in the paper, a toy two-component two-dimensional Gaussian mixture and another toy two-component one-dimensional Gaussian mixture observed with noise (and in absolute value), do not help in perceiving the definitive need for this Carlin and Chib version. Especially when considering the construction of the pseudo-priors.**

*m*## On the use of marginal posteriors in marginal likelihood estimation via importance-sampling

Posted in R, Statistics, University life with tags Bayes factor, Chib's approximation, evidence, harmonic mean estimator, label switching, latent variable, marginal likelihood, MCMC, mixtures, Monte Carlo Statistical Methods, nested sampling, Poisson regression, Rao-Blackwellisation, simulation on November 20, 2013 by xi'an**P**errakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the components. The only critical issue is to find good approximations to the marginal posterior densities. This is handled in the paper either by normal approximations or by Rao-Blackwell estimates. the latter being rather costly since one importance weight involves B.L computations, where B is the number of blocks and L the number of samples used in the Rao-Blackwell estimates. The time factor does not seem to be included in the comparison studies run by the authors, although it would seem necessary when comparing scenarii.

**A**fter a standard regression example (that did not include Chib’s solution in the comparison), the paper considers 2- and 3-component mixtures. The discussion centres around label switching (of course) and the deficiencies of Chib’s solution against the current method and Neal’s reference. The study does not include averaging Chib’s solution over permutations as in Berkoff et al. (2003) and Marin et al. (2005), an approach that does eliminate the bias. Especially for a small number of components. Instead, the authors stick to the log(k!) correction, despite it being known for being quite unreliable (depending on the amount of overlap between modes). The final example is Diggle et al. (1995) longitudinal Poisson regression with random effects on epileptic patients. The appeal of this model is the unavailability of the integrated likelihood which implies either estimating it by Rao-Blackwellisation or including the 58 latent variables in the analysis. (There is no comparison with other methods.)

**A**s a side note, among the many references provided by this paper, I did not find trace of Skilling’s nested sampling or of safe harmonic means (as exposed in our own survey on the topic).

## open problem

Posted in R, Statistics with tags cdf, fixed-point equation, mixtures, numerical resolution, quantiles, R on October 24, 2013 by xi'an**O**n the plane back from Warwick, I was reading an ABC arXived paper by Umberto Picchini and Julie Forman, “Accelerating inference for diffusions observed with measurement error and large sample sizes using Approximate Bayesian Computation: A case study” and came upon this open problem:

“A closed-form expression for generating percentiles from a finite-components Gaussian mixture is unavailable.” (p.5)

which means solving

is not possible in closed form. (Of course it could also be argued that the equation *Φ(x)=β* is unavailable in closed-form ie that the analytic solution *x=Φ ^{-1}(β)* is formal…) While I can think of several numerical approaches, a few minutes with a sheet of paper let me convinced that indeed this is not solvable (hence not an open problem, contrary to the title of the post!).

**J**ust for R practice (and my R course students!), here is a basic R code:

mixant=function(alpha=0.5,beta=0.95,mu,sig=1,prec=1/10^4){ onmal=1-alpha qbeta=qnorm(beta) # initial bounds omb=min(qbeta,mu+sig*qbeta) omB=max(qbeta,mu+sig*qbeta) if (beta<alpha){ omB=min(omB,qnorm(beta/alpha)) }else{ omb=max(omb,mu+sig*qnorm((beta-alpha)/onmal))} if (beta<onmal){ omB=min(omB,mu+sig*qnorm(beta/onmal)) }else{ omb=max(omb,qnorm((beta-onmal)/alpha))} # iterations for (t in 1:5){ ranj=seq(omb,omB,len=17) cfs=alpha*pnorm(ranj)+onmal*pnorm((ranj-mu)/sig) omb=max(ranj[cfs<=beta]) omB=min(ranj[cfs>=beta]) if ((omB-omb)<prec) break()} return(.5*(omb+omB))}

## Dear Sir, I am unable to understand…

Posted in Statistics, University life with tags Bayesian statistics, Master program, MCMC algorithms, Metropolis-Hastings algorithms, mixtures, prior, random numbers on January 30, 2013 by xi'an**H**ere is an email I received a few days ago, similar to many other emails I/we receive on a regular basis:

I am working on Markov Chain Monte Carlo methods as part of my Masters project. I have to estimate mean, variance from a Gaussian mixture using metropolis method. I came across your paper ‘Bayesian Modelling and Inference on Mixtures of Distributions’. I am unable to understand how to obtain the new sample for mean, variance etc… I am using uniform distribution as proposal distribution. Should it be random numbers for the proposal distribution.

I have been working and trying to understand this for a long time. I would be grateful for any help.

While I felt sorry for the Master student, I consider it is the responsibility of his/her advisor to give her/him the proper directions for understanding the paper. (Given the contents of the email, it sounds as if the student would require proper training in both Bayesian statistics [uniform priors on unbounded parameters?] and simulation [the question about random numbers does not make sense]…) This is what I replied to the student, hopefully in a positive tone.