Archive for controlled MCMC

mixture modelling for testing hypotheses

Posted in Books, Statistics, University life with tags , , , , , , , , , , on January 4, 2019 by xi'an

After a fairly long delay (since the first version was posted and submitted in December 2014), we eventually revised and resubmitted our paper with Kaniav Kamary [who has now graduated], Kerrie Mengersen, and Judith Rousseau on the final day of 2018. The main reason for this massive delay is mine’s, as I got fairly depressed by the general tone of the dozen of reviews we received after submitting the paper as a Read Paper in the Journal of the Royal Statistical Society. Despite a rather opposite reaction from the community (an admittedly biased sample!) including two dozens of citations in other papers. (There seems to be a pattern in my submissions of Read Papers, witness our earlier and unsuccessful attempt with Christophe Andrieu in the early 2000’s with the paper on controlled MCMC, leading to 121 citations so far according to G scholar.) Anyway, thanks to my co-authors keeping up the fight!, we started working on a revision including stronger convergence results, managing to show that the approach leads to an optimal separation rate, contrary to the Bayes factor which has an extra √log(n) factor. This may sound paradoxical since, while the Bayes factor  converges to 0 under the alternative model exponentially quickly, the convergence rate of the mixture weight α to 1 is of order 1/√n, but this does not mean that the separation rate of the procedure based on the mixture model is worse than that of the Bayes factor. On the contrary, while it is well known that the Bayes factor leads to a separation rate of order √log(n) in parametric models, we show that our approach can lead to a testing procedure with a better separation rate of order 1/√n. We also studied a non-parametric setting where the null is a specified family of distributions (e.g., Gaussians) and the alternative is a Dirichlet process mixture. Establishing that the posterior distribution concentrates around the null at the rate √log(n)/√n. We thus resubmitted the paper for publication, although not as a Read Paper, with hopefully more luck this time!

high-dimensional stochastic simulation and optimisation in image processing [day #1]

Posted in pictures, Statistics, Travel, Uncategorized, University life, Wines with tags , , , , , , , , , , , on August 29, 2014 by xi'an

Even though I flew through Birmingham (and had to endure the fundamental randomness of trains in Britain), I managed to reach the “High-dimensional Stochastic Simulation and Optimisation in Image Processing” conference location (in Goldney Hall Orangery) in due time to attend the (second) talk by Christophe Andrieu. He started with an explanation of the notion of controlled Markov chain, which reminded me of our early and famous-if-unpublished paper on controlled MCMC. (The label “controlled” was inspired by Peter Green who pointed out to us the different meanings of controlled in French [meaning checked or monitored] and in English . We use it here in the English sense, obviously.) The main focus of the talk was on the stability of controlled Markov chains. With of course connections with out controlled MCMC of old, for instance the case of the coerced acceptance probability. Which happened to be not that stable! With the central tool being Lyapounov functions. (Making me wonder whether or not it would make sense to envision the meta-problem of adaptively estimating the adequate Lyapounov function from the MCMC outcome.)

As I had difficulties following the details of the convex optimisation talks in the afternoon, I eloped to work on my own and returned to the posters & wine session, where the small number of posters allowed for the proper amount of interaction with the speakers! Talking about the relevance of variational Bayes approximations and of possible tools to assess it, about the use of new metrics for MALA and of possible extensions to Hamiltonian Monte Carlo, about Bayesian modellings of fMRI and of possible applications of ABC in this framework. (No memorable wine to make the ‘Og!) Then a quick if reasonably hot curry and it was already bed-time after a rather long and well-filled day!z

advanced Markov chain Monte Carlo methods

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on December 5, 2011 by xi'an

This book, Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples, by Faming Liang, Chuanhai Liu, and Raymond Carroll, appeared last year and has been sitting on my desk all this time, patiently (?) waiting for a review. When I received it, I took a brief look at it (further than the cool cover!) and then decided I needed more than that to write a useful review! Here are my impressions  on Advanced Markov Chain Monte Carlo Methods after a deeper read. (I have not read any other review in the main statistical journals so far.)

The title, Advanced Markov Chain Monte Carlo Methods, is a clear warning on the level of the book: “advanced”, it certainly is!!! By page 85, the general description of MCMC simulation methods is completed, including perfect sampling and reversible jump MCMC, and the authors engage into a detailed description of highly specialised topics of their choice: Auxiliary variables (Chap. 4), Population-based MCMC (Chap. 5), Dynamic weighting (Chap. 6), Stochastic approximation Monte Carlo (Chap. 7), and MCMC with adaptive proposals (Chap. 8).  The book is clearly inspired by the numerous papers the authors have written in those area, especially Faming Liang. (The uneven distribution of the number of citations per year with peaks in 2000 and 2009 reflects this strong connection.) While the book attempts at broadening the spectrum by including introductory sections, and discussing other papers, it remains nonetheless that this centred focus of the book reduces its potential readership to graduate students and researchers who could directly work on the original papers. I would thus hesitate in teaching my graduate students from this book, given that they only attend a single course on Monte Carlo methods. Continue reading

Andrew gone NUTS!

Posted in pictures, R, Statistics, University life with tags , , , , , , , , , , on November 24, 2011 by xi'an

Matthew Hoffman and Andrew Gelman have posted a paper on arXiv entitled “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” and developing an improvement on the Hamiltonian Monte Carlo algorithm called NUTS (!). Here is the abstract:

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC’s performance is highly sensitive to two user-specified parameters: a step size ε and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ε on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient “turnkey” sampling algorithms.

Now, my suspicious and pessimistic nature always makes me wary of universality claims! I completely acknowledge the difficulty in picking the number of leapfrog steps L in the Hamiltonian algorithm, since the theory behind does not tell anything useful about L. And the paper is quite convincing in its description of the NUTS algorithm, being moreover beautifully written.  As indicated in the paper, the “doubling” solution adopted by NUTS is reminding me of Radford Neal’s procedure in his Annals of Statistics paper on slice sampling. (The NUTS algorithm also relies on a slice sampling step.) However, besides its ability to work as an automatic Hamiltonian methodology, I wonder about the computing time (and the “unacceptably large amount of memory”, p.12) required by the doubling procedure: 2j is growing fast with j! (If my intuition is right, the computing time should increase rather quickly with the dimension. And I do not get the argument within the paper that the costly part is the gradient computation: it seems to me the gradient must be computed for all of the 2j points.) The authors also mention delayed rejection à la Tierney and Mira (1999) and the scheme reminded me a wee of the pinball sampler we devised a while ago with Kerrie Mengersen. Choosing the discretisation step ε is more “traditional”, using the stochastic approximation approach we set in our unpublished-yet-often-quoted tech report with Christophe Andrieu. (I do not think I got the crux of the “dual averaging” for optimal calibration on p.11) The illustration through four benchmarks [incl. a stochastic volatility model!] is quite convincing as well, with (unsurprisingly) great graphical tools. A final grumble: that the code is “only” available in the proprietary language Matlab! Now, I bet some kind of Rao-Blackwellisation is feasible with all the intermediate simulations!

Adaptive ABC

Posted in Statistics, University life with tags , , , , , , on November 9, 2011 by xi'an

Maxime Lenormand, Franck Jabot and Guillaume Deffuant have just posted on arXiv a paper about a refinement of the ABC-PMC algorithm we developed with Marc Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin. The authors state in their introduction that ABC-PMC

presents two shortcomings which are particularly problematic for costly to simulate complex models. First, the sequence of tolerance levels ε1,…,εT has to be provided to the ABC algorithm. In practice, this implies to do preliminary simulations of the model, a step which is computationally costly for complex models. Furthermore, a badly chosen sequence of tolerance levels may inflate the number of simulations required to reach a given precision as we will see below. A second shortcoming of the PMC-ABC algorithm is that it lacks a criterion to decide whether it has converged. The final tolerance level εT may be too large for the ABC approach to satisfactorily approximate the posterior distribution of the model. Inversely, a larger εT may be sufficient to obtain a good approximation of the posterior distribution, hence sparing a number of model simulations.

shortcomings which I thought were addressed by the ABC-SMC algorithm of Pierre Del Moral, Arnaud Doucet and Ajay Jasra [not referenced in the current paper], the similar algorithm of Chris Drovandi and Tony Pettitt, and our recent paper with  Jean-Michel Marin, Pierre Pudlo and Mohammed Sedki [presented at ABC in London, submitted to Statistics and Computing a few months ago, but alas not available on-line for unmentionable reasons linked to the growing dysfunctionality of one co-author…!]. It is correct that we did not address the choice of the εt‘s in the original paper, even though we already used an on-line selection as a quantile of the current sample of distances. In essence, given the fundamentally non-parametric nature of ABC, the tolerances εt should always be determined from the simulated samples, as regular bandwidths.

The paper essentially proposes the same scheme as in Del Moral et al., before applying it to the toy example of Sisson et al. (PNAS, 2007) and to a more complex job dynamic model in central France. Were I to referee this paper, I would thus suggest that the authors incorporate a comparison with both papers of Del Moral et al. and of Drovandi and Pettitt to highlight the methodological  novelty of their approach.

Robust adaptive Metropolis algorithm [arXiv:10114381]

Posted in R, Statistics with tags , , , , , , on November 23, 2010 by xi'an

Matti Vihola has posted a new paper on arXiv about adaptive (random walk) Metropolis-Hastings algorithms. The update in the (lower diagonal) scale matrix is

S_nS_n^\text{T}=S_{n-1}\left(\mathbf{I}_d-\eta_n[\alpha_n-\alpha^\star]\dfrac{U_nU_n^\text{T}}{||U_n||^2}\right)S^\text{T}_{n-1}

where

  • \alpha_n is the current acceptance probability and \alpha^\star the target acceptance rate;
  • U_n is the current random noise for the proposal, Y_n=X_{n-1}+S_{n-1}U_n;
  • \eta_n is a step size sequence decaying to zero.

The spirit of the adaptation is therefore a Robbins-Monro type adaptation of the covariance matrix in the random walk, with a target acceptance rate. It follows the lines Christophe Andrieu and I had drafted in our [most famous!] unpublished paper, Controlled MCMC for optimal sampling. The current paper shows that the fixed point for S_n is proportional to the scale of the target if the latter is elliptically symmetric (but does not establish a sufficient condition for convergence). It concludes with a Law of Large Numbers for the empirical average of the f(X_n) under rather strong assumptions (on f, the target, and the matrices S_n). The simulations run on formalised examples show a clear improvement over the existing adaptive algorithms (see above) and the method is implemented within Matti Vihola’s Grapham software. I presume Matti will present this latest work during his invited talk at Adap’skiii.

Ps-Took me at least 15 minutes to spot the error in the above LaTeX formula, ending up with S^\text{T}_{n−1}: Copy-pasting from the pdf file had produced an unconventional minus sign in n−1 that was impossible to spot!

Adaptive SMC

Posted in Statistics with tags , , , , , , , on May 14, 2010 by xi'an

On Monday, Paul Fearnhead and Benjamin Taylor reposted on arXiv a paper about adaptive SMC. It is as well since I had missed the first posting on Friday. While the method has some similarities with our earlier work on population Monte Carlo methods with Olivier Cappé, Randal Douc, Arnaud Guillin and Jean-Michel Marin, there are quite novel and interesting features in this paper!  First, the paper is firmly set within a sequential setup, as in Chopin (2002, Biometrika) and Del Moral, Doucet and Jasra (2006, JRSS B). This means considering a sequence of targets corresponding to likelihoods with increasing datasets. We mentioned this case as a possible implementation of population Monte Carlo but never truly experimented with this. Fearnhead and Taylor do set their method within this framework, using a sequence of populations (or particle systems) aimed at this moving sequence of targets. The second major difference is that, while they also use a mixture of transition kernels as their proposal (or importance functions) and while they also aim at optimising the parameters of those transitions (parameters that I would like to dub cyberparameters to distinguish them from the parameters of the statistical model), they do not update those cyberparameters in a deterministic way, as we do. On the opposite, they build a non-parametric approximation \pi_t(h) to the distribution of those cyberparameters and simulate from those approximations at each step of the sequential algorithm, using a weight f(\theta^{(j)}_{t-1},\theta^{(j)}_t) that assesses the quality of the move from \theta^{(j)}_{t-1} to  \theta^{(j)}_{t}, based on the simulated h^{(j)}_t. I like very much this aspect of the paper, in that the cyberparameters are part of the dynamics in the stochastic algorithm, a point I tried to implement since the (never published) controlled MCMC paper with Christophe Andrieu. As we do in our paper now published in Statistics and Computing, they further establish that this method is asymptotically improving the efficiency criterion at each step of the sequential procedure. The paper concludes with an extensive simulation study where Fearnhead and Taylor show that their implementation outperforms random walk with adaptive steps. (I am not very happy with their mixture example in that they resort to an ordering on the means…)