## Dan Simpson’s seminar at CREST

Posted in Kids, Mountains, Statistics, Travel, University life with tags , , , , , , , , , on April 18, 2014 by xi'an

Daniel Simpson gave a seminar at CREST yesterday on his recently arXived paper, “Penalising model component complexity: A principled, practical  approach to constructing priors” written with Thiago Martins, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. Paper that he should also have given in Banff last month had he not lost his passport in København airport…  I have already commented at length on this exciting paper, hopefully to become a discussion paper in a top journal!, so I am just pointing out two things that came to my mind during the energetic talk delivered by Dan to our group. The first thing is that those penalised complexity (PC) priors of theirs rely on some choices in the ordering of the relevance, complexity, nuisance level, &tc. of the parameters, just like reference priors. While Dan already wrote a paper on Russian roulette, there is also a Russian doll principle at work behind (or within) PC priors. Each shell of the Russian doll corresponds to a further level of complexity whose order need be decided by the modeller… Not very realistic in a hierarchical model with several types of parameters having only local meaning.

My second point is that the construction of those “politically correct” (PC) priors reflects another Russian doll structure, namely one of embedded models, hence would and should lead to a natural multiple testing methodology. Except that Dan rejected this notion during his talk, by being opposed to testing per se. (A good topic for one of my summer projects, if nothing more, then!)

## MCMC for sampling from mixture models

Posted in Kids, Statistics, University life with tags , , on April 17, 2014 by xi'an

Randal Douc, Florian Maire, and Jimmy Olsson recently arXived a paper on the use of Markov chain Monte Carlo methods for the sampling of mixture models, which contains the recourse to Carlin and Chib (1995) pseudo-priors to simulate from a mixture distribution (and not from the posterior distribution associated with a mixture sampling model). As reported earlier, I was in the thesis defence of Florian Maire and this approach had already puzzled me at the time. In short, a mixture structure

$\pi(z)\propto\sum_{m=1}^k \tilde\pi(m,z)$

gives rises to as many auxiliary variables as there are components, minus one: namely, if a simulation z is generated from a given component i of the mixture, one can create pseudo-simulations u from all the other components, using pseudo-priors à la Carlin and Chib. A Gibbs sampler based on this augmented state-space can then be implemented:  (a) simulate a new component index m given (z,u);  (b) simulate a new value of (z,u) given m. One version (MCC) of the algorithm simulates z given m from the proper conditional posterior by a Metropolis step, while another one (FCC) only simulate the u‘s. The paper shows that MCC has a smaller asymptotic variance than FCC. I however fail to understand why a Carlin and Chib is necessary in a mixture context: it seems (from the introduction) that the motivation is that a regular Gibbs sampler [simulating z by a Metropolis-Hastings proposal then m] has difficulties moving between components when those components are well-separated. This is correct but slightly moot, as each component of the mixture can be simulated separately and in advance in z, which leads to a natural construction of (a) the pseudo-priors used in the paper, (b) approximations to the weights of the mixture, and (c) a global mixture independent proposal, which can be used in an independent Metropolis-Hastings mixture proposal that [seems to me to] alleviate(s) the need to simulate the component index m. Both examples used in the paper, a toy two-component two-dimensional Gaussian mixture and another toy two-component one-dimensional Gaussian mixture observed with noise (and in absolute value), do not help in perceiving the definitive need for this Carlin and Chib version. Especially when considering the construction of the pseudo-priors.

## Journées MAS2014, Toulouse, Aug. 27-29

Posted in Kids, pictures, Travel, University life, Wines with tags , , , , , , , on April 16, 2014 by xi'an

For those interested in visiting Toulouse at the end of the summer for a French speaking conference in Probability and Statistics, the Modélisation-Aléatoire-Statistique branch of SMAI (the French version of SIAM) is holding its yearly conference. The main theme this year is “High dimension phenomena”, but a large panel of the French research in Probability and Statistics will be represented. The program contains in particular:

• Six plenary conferences and 3 talks by the recent winners of the “Prix Jacques Neveu” award [including Pierre Jacob!],
• 22 parallel sessions, from probability theory to applied statistics and machine learning,
• Posters session for students

More detail is available on the conference website (in French).  (The organizing committee is made of Aurélien Garivier, Sébastien Gerchinovitz, Aldéric Joulin, Clément Pellegrini, and Laurent Risser.)

## MCqMC 2014 [closup]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , on April 16, 2014 by xi'an

As mentioned earlier, this was my very first MCqMC conference and I really enjoyed it, even though (or because) there were many topics that did not fall within my areas of interest. (By comparison, WSC is a serie of conferences too remote from those areas for my taste, as I realised in Berlin where we hardly attended any talk and hardly anyone attended my session!) Here I appreciated the exposure to different mathematical visions on Monte Carlo, without being swamped by applications as at WSC… Obviously, our own Bayesian computational community was much less represented than at, say, MCMSki! Nonetheless, I learned a lot during this conference for instance from Peter Glynn‘s fantastic talk, and I came back home with new problems and useful references [as well as a two-hour delay in the train ride from Brussels]. I also obviously enjoyed the college-town atmosphere of Leuven, the many historical landmarks  and the easily-found running routes out of the town. I am thus quite eager to attend the next MCqMC 2016 meeting (in Stanford, an added bonus!) and even vaguely toying with the idea of organising MCqMC 2018 in Monaco (depending on the return for ISBA 2016 and ISBA 2018). In any case, thanks to the scientific committee for the invitation to give a plenary lecture in Leuven and to the local committee for a perfect organisation of the meeting.

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , on April 15, 2014 by xi'an

“At equilibrium, we thus should not expect gains of several orders of magnitude.”

As was signaled to me several times during the MCqMC conference in Leuven, Rémi Bardenet, Arnaud Doucet and Chris Holmes (all from Oxford) just wrote a short paper for the proceedings of ICML on a way to speed up Metropolis-Hastings by reducing the number of terms one computes in the likelihood ratio involved in the acceptance probability, i.e.

$\prod_{i=1}^n\frac{L(\theta^\prime|x_i)}{L(\theta|x_i)}.$

The observations appearing in this likelihood ratio are a random subsample from the original sample. Even though this leads to an unbiased estimator of the true log-likelihood sum, this approach is not justified on a pseudo-marginal basis à la Andrieu-Roberts (2009). (Writing this in the train back to Paris, I am not convinced this approach is in fact applicable to this proposal as the likelihood itself is not estimated in an unbiased manner…)

In the paper, the quality of the approximation is evaluated by Hoeffding’s like inequalities, which serves as the basis for a stopping rule on the number of terms eventually evaluated in the random subsample. In fine, the method uses a sequential procedure to determine if enough terms are used to take the decision and the probability to take the same decision as with the whole sample is bounded from below. The sequential nature of the algorithm requires to either recompute the vector of likelihood terms for the previous value of the parameter or to store all of them for deriving the partial ratios. While the authors adress the issue of self-evaluating whether or not this complication is worth the effort, I wonder (from my train seat) why they focus so much on recovering the same decision as with the complete likelihood ratio and the same uniform. It would suffice to get the same distribution for the decision (an alternative that is easier to propose than to create of course). I also (idly) wonder if a Gibbs version would be manageable, i.e. by changing only some terms in the likelihood ratio at each iteration, in which case the method could be exact… (I found the above quote quite relevant as, in an alternative technique we are constructing with Marco Banterle, the speedup is particularly visible in the warmup stage.) Hence another direction in this recent flow of papers attempting to speed up MCMC methods against the incoming tsunami of “Big Data” problems.

## Leuven snapshot [#7]

Posted in pictures, Running, Travel, University life with tags , , , , on April 15, 2014 by xi'an

## Valparaiso under fire

Posted in pictures, Travel with tags , , , , on April 14, 2014 by xi'an