On the last day of the IFCAM workshop in Bangalore, Marc Lavielle from INRIA presented a talk on mixed effects where he illustrated his original computer language Monolix. And mentioned that his CRC Press book on Mixed Effects Models for the Population Approach was out! (Appropriately listed as out on a 14th of July on amazon!) He actually demonstrated the abilities of Monolix live and on diabets data provided by an earlier speaker from Kolkata, which was a perfect way to start initiating a collaboration! Nice cover (which is all I saw from the book at this stage!) that maybe will induce candidates to write a review for CHANCE. Estimation of those mixed effect models relies on stochastic EM algorithms developed by Marc Lavielle and Éric Moulines in the 90’s, as well as MCMC methods.
Archive for MCMC
Today, I took part in the thesis defence of Amandine Shreck at Telecom-ParisTech. I had commented a while ago on the Langevin algorithm for discontinuous targets she developed with co-authors from that school towards variable selection. The thesis also contains material on the equi-energy sampler that is worth mentioning. The algorithm relates to the Wang-Landau algorithm last discussed here for the seminars of Pierre and Luke in Paris, last month. The algorithm aims at facilitating the moves around the target density by favouring moves from one energy level to the next. As explained to me by Pierre once again after his seminar, the division of the space according to the target values is a way to avoid creating artificial partitions over the sampling space. A sort of Lebesgue version of Monte Carlo integration. The energy bands
require the choice of a sequence of bounds on the density, values that are hardly available prior to the simulation of the target. The paper corresponding to this part of the thesis (and published in our special issue of TOMACS last year) thus considers the extension when the bounds are defined on the go, in a adaptive way. This could be achieved based on earlier simulations, using some quantiles of the observed values of the target but this is a costly solution which requires to keep an ordered sample of the density values. (Is it that costly?!) Thus the authors prefer to determine the energy levels in a cheaper adaptive manner. Namely, through a Robbins-Monro/stochastic approximation type update of the bounds,
My questions related with this part of the thesis were about the actual gain if any in computing time versus efficiency, the limitations in terms of curse of dimension and storage, the connections with the Wang-Landau algorithm and pseudo-marginal approximations, and the (degree of) likelihood of an universal and automatised adaptive equi-energy sampler.
Following yesterday’s post on Rao’s, Liu’s, and Dunson’s paper on a new approach to intractable normalising constants, and taking advantage of being in Warwick, I tested the method on a toy model, namely the posterior associated with n Student’s t observations with unknown location parameter μ and a flat prior,
which is “naturally” bounded by a Cauchy density with scale √ν. The constant M is then easily derived and running the new algorithm follows from a normal random walk proposal targeting the augmented likelihood (R code below).
As shown by the above graph, the completion-by-rejection scheme produces a similar outcome (tomato) as the one based on the sole observations (steelblue). With a similar acceptance rate. However, the computing time is much much degraded:
> system.time(g8()) user system elapsed 53.751 0.056 54.103 > system.time(g9()) user system elapsed 1.156 0.000 1.161
when compared with the no-completion version. Here is the entire R code that produced both MCMC samples: Continue reading
My last day at this ICMS workshop on molecular simulation started [with a double loop of Arthur's Seat thankfully avoiding the heavy rains of the previous night and then] Chris Chipot‘s magistral entry to molecular simulation for proteins with impressive slides and simulation movies, even though I could not follow the details to really understand the simulation challenges therein, just catching a few connections with earlier talks. A typical example of a cross-disciplinary gap, where the other discipline always seems to be stressing the ‘wrong” aspects. Although this is perfectly unrealistic, it would immensely to prepare talks in pairs for such interdisciplinary workshops! Then Gersende Fort presented results about convergence and efficiency for the Wang-Landau algorithm. The idea is to find the optimal rate for updating the weights of the elements of the partition towards reaching the flat histogram in minimal time. Showing massive gains on toy examples. The next talk went back to molecular biology with Jérôme Hénin‘s presentation on improved adaptive biased sampling. With an exciting notion of orthogonality aiming at finding the slowest directions in the target and putting the computational effort. He also discussed the tension between long single simulations and short repeated ones, echoing a long-going debate in the MCMC community. (He also had a slide with a picture of my first 1983 Apple IIe computer!) Then Antonietta Mira gave a broad perspective on delayed rejection and zero variance estimates. With impressive variance reductions (although some physicists then asked for reduction of order 10¹⁰!). Johannes Zimmer gave a beautiful maths talk on the connection between particle and diffusion limits (PDEs) and Wasserstein geometry and large deviations. (I did not get most of the talk, but it was nonetheless beautiful!) Bert Kappen concluded the day (and the workshop for me) by a nice introduction to control theory. Making connection between optimal control and optimal importance sampling. Which made me idly think of the following problem: what if control cannot be completely… controlled and hence involves a stochastic part? Presumably of little interest as the control would then be on the parameters of the distribution of the control.
“The alanine dipeptide is the fruit fly of molecular simulation.”
The example of this alanine dipeptide molecule was so recurrent during the talks that it justified the above quote by Michael Allen. Not that I am more proficient in the point of studying this protein or using it as a benchmark. Or in identifying the specifics of the challenges of molecular dynamics simulation. Not a criticism of the ICMS workshop obviously, but rather of my congenital difficulty with continuous time processes!!! So I do not return from Edinburgh with a new research collaborative project in molecular dynamics (if with more traditional prospects), albeit with the perception that a minimal effort could bring me to breach the vocabulary barrier. And maybe consider ABC ventures in those (new) domains. (Although I fear my talk on ABC did not impact most of the audience!)
The third day [morn] at our ICMS workshop was dedicated to path sampling. And rare events. Much more into [my taste] Monte Carlo territory. The first talk by Rosalind Allen looked at reweighting trajectories that are not in an equilibrium or are missing the Boltzmann [normalizing] constant. Although the derivation against a calibration parameter looked like the primary goal rather than the tool for constant estimation. Again papers in J. Chem. Phys.! And a potential link with ABC raised by Antonietta Mira… Then Jonathan Weare discussed stratification. With a nice trick of expressing the normalising constants of the different terms in the partition as solution(s) of a Markov system
Because the stochastic matrix M is easier (?) to approximate. Valleau’s and Torrie’s umbrella sampling was a constant reference in this morning of talks. Arnaud Guyader’s talk was in the continuation of Toni Lelièvre’s introduction, which helped a lot in my better understanding of the concepts. Rephrasing things in more statistical terms. Like the distinction between equilibrium and paths. Or bias being importance sampling. Frédéric Cérou actually gave a sort of second part to Arnaud’s talk, using importance splitting algorithms. Presenting an algorithm for simulating rare events that sounded like an opposite nested sampling, where the goal is to get down the target, rather than up. Pushing particles away from a current level of the target function with probability ½. Michela Ottobre completed the series with an entry into diffusion limits in the Roberts-Gelman-Gilks spirit when the Markov chain is not yet stationary. In the transient phase thus.
The last “tutorial” talk at this ICMS workshop ["at the interface between mathematical statistics and molecular simulation"] was given by Tony Lelièvre on adaptive bias schemes in Langevin algorithms and on the parallel replica algorithm. This was both very interesting because of the potential for connections with my “brand” of MCMC techniques and rather frustrating as I felt the intuition behind the physical concepts like free energy and metastability was almost within my reach! The most manageable time in Tony’s talk was the illustration of the concepts through a mixture posterior example. Example that I need to (re)read further to grasp the general idea. (And maybe the book on Free Energy Computations Tony wrote with Mathias Rousset et Gabriel Stoltz.) A definitely worthwhile talk that I hope will get posted on line by ICMS. The other talks of the day were mostly of a free energy nature, some using optimised bias in the Langevin diffusion (except for Pierre Jacob who presented his non-negative unbiased estimation impossibility result).
The first talks of the day at this ICMS workshop ["at the interface between mathematical statistics and molecular simulation"] were actually lectures introducing molecular simulation to statisticians by Michael Allen from Warwick and computational statistics to physicists by Omiros Papaspiliopoulos. Allen’s lecture was quite pedagogical, even though I had to quiz wikipedia for physics terms and notions. Like a force being the gradient of a potential function. He gave a physical meaning to Langevin’ equation. As well as references from the Journal of Chemical Physics that were more recent than 1953. He mentioned alternatives to Langevin’s equation too and I idly wondered at the possibility of using those alternatives as other tools for improved MCMC simulation. Although introducing friction may not be the most promising way to speed up the thing… He later introduced what statisticians call Langevin’ algorithm (MALA) as smart Monte Carlo (Rossky et al., …1978!!!). Recovering Hamiltonian and hybrid Monte Carlo algorithms as a fusion of molecular dynamics, Verlet algorithm, and Metropolis acceptance step! As well as reminding us of the physics roots of umbrella sampling and the Wang-Landau algorithm.
Omiros Papaspiliopoulos also gave a very pedagogical entry to the convergence of MCMC samplers which focussed on the L² approach to convergence. This reminded me of the very first papers published on the convergence of the Gibbs sampler, like the
1990 1992 JCGS paper by Schervish and Carlin. Or the 1991 1996 Annals of Statistics by Amit. (Funny that I located both papers much earlier than when they actually appeared!) One surprising fact was that the convergence of all reversible ergodic kernels is necessarily geometric. There is no classification of kernels in this topology, the only ranking being through the respective spectral gaps. A good refresher for most of the audience, statisticians included.
The following talks of Day 1 were by Christophe Andrieu, who kept with the spirit of a highly pedagogical entry, covering particle filters, SMC, particle Gibbs and pseudo-marginals, and who hit the right tone I think given the heterogeneous audience. And by Ben Leimkuhler about particle simulation for very large molecular structures. Closing the day by focussing on Langevin dynamics. What I understood from the talk was an improved entry into the resolution of some SPDEs. Gaining two orders when compared with Euler-Marayama. But missed the meaning of the friction coefficient γ converging to infinity in the title…