## Archive for NUTS

## faster HMC [poster at CIRM]

Posted in Statistics with tags CIRM, eHMC, HMC, Jean Morlet Chair, Luminy, Monte Carlo Statistical Methods, NUTS, poster, Université Aix Marseille on November 26, 2018 by xi'an## accelerating HMC by learning the leapfrog scale

Posted in Books, Statistics with tags eHMC, ESJD, ESS, Hamiltonian Monte Carlo, HMC, leapfrog integrator, mixing speed, NUTS, stochastic volatility on October 12, 2018 by xi'an**I**n this new arXiv submission that was part of Changye Wu’s thesis [defended last week], we try to reduce the high sensitivity of the HMC algorithm to its hand-tuned parameters, namely the step size ε of the discretisation scheme, the number of steps L of the integrator, and the covariance matrix of the auxiliary variables. By calibrating the number of steps of the Leapfrog integrator towards avoiding both slow mixing chains and wasteful computation costs. We do so by learning from the No-U-Turn Sampler (NUTS) of Hoffman and Gelman (2014) which already automatically tunes both the step size and the number of leapfrogs.

The core idea behind NUTS is to pick the step size via primal-dual averaging in a burn-in (warmup, Andrew would say) phase and to build at each iteration a proposal based on following a locally longest path on a level set of the Hamiltonian. This is achieved by a recursive algorithm that, at each call to the leapfrog integrator, requires to evaluate both the gradient of the target distribution and the Hamiltonianitself. Roughly speaking an iteration of NUTS costs twice as much as regular HMC with the same number of calls to the integrator. Our approach is to learn from NUTS the scale of the leapfrog length and use the resulting empirical distribution of the longest leapfrog path to randomly pick the value of L at each iteration of an HMC scheme. This obviously preserves the validity of the HMC algorithm.

While a theoretical comparison of the convergence performances of NUTS and this eHMC proposal seem beyond our reach, we ran a series of experiments to evaluate these performances, using as a criterion an ESS value that is calibrated by the evaluation cost of the logarithm of target density function and of its gradient, as this is usually the most costly part of the algorithms. As well as a similarly calibrated expected square jumping distance. Above is one such illustration for a stochastic volatility model, the first axis representing the targeted acceptance probability in the Metropolis step. Some of the gains in either ESS or ESJD are by a factor of ten, which relates to our argument that NUTS somewhat wastes computation effort using a uniformly distributed proposal over the candidate set, instead of being close to its end-points, which automatically reduces the distance between the current position and the proposal.

## accelerating MCMC

Posted in Statistics with tags acceleration of MCMC algorithms, coupling, Hamiltonian Monte Carlo, India, MCMC, Monte Carlo Statistical Methods, motorbike, NUTS, Rajasthan, review, survey, tempering, WIREs on May 29, 2017 by xi'an**I** have recently [well, not so recently!] been asked to write a review paper on ways of accelerating MCMC algorithms for the [review] journal WIREs Computational Statistics and would welcome all suggestions towards the goal of accelerating MCMC algorithms. Besides [and including more on]

- coupling strategies using different kernels and switching between them;
- tempering strategies using flatter or lower dimensional targets as intermediary steps, e.g., à la Neal;
- sequential Monte Carlo with particle systems targeting again flatter or lower dimensional targets and adapting proposals to this effect;
- Hamiltonian MCMC, again with connections to Radford (and more generally ways of avoiding rejections);
- adaptive MCMC, obviously;
- Rao-Blackwellisation, just as obviously (in the sense that increasing the precision in the resulting estimates means less simulations).

## common derivation for Metropolis–Hastings and other MCMC algorithms

Posted in Books, pictures, Statistics, Travel, University life with tags auxiliary variables, directional sampling, Gibbs sampling, Hamiltonian Monte Carlo, Metropolis-Hastings algorithms, Metropolis-within-Gibbs algorithm, NUTS, pseudo-marginal MCMC, recursive proposals, RJMCMC, slice sampling, Sydney, UNSW on July 25, 2016 by xi'an**K**hoa Tran and Robert Kohn from UNSW just arXived a paper on a comprehensive derivation of a large range of MCMC algorithms, beyond Metropolis-Hastings. The idea is to decompose the MCMC move into

- a random completion of the current value θ into V;
- a deterministic move T from (θ,V) to (ξ,W), where only ξ matters.

If this sounds like a new version of Peter Green’s completion at the core of his 1995 RJMCMC algorithm, it is because it is indeed essentially the same notion. The resort to this completion allows for a standard form of the Metropolis-Hastings algorithm, which leads to the correct stationary distribution if T is self-inverse. This representation covers Metropolis-Hastings algorithms, Gibbs sampling, Metropolis-within-Gibbs and auxiliary variables methods, slice sampling, recursive proposals, directional sampling, Langevin and Hamiltonian Monte Carlo, NUTS sampling, pseudo-marginal Metropolis-Hastings algorithms, and pseudo-marginal Hamiltonian Monte Carlo, as discussed by the authors. Given this representation of the Markov chain through a random transform, I wonder if Peter Glynn’s trick mentioned in the previous post on retrospective Monte Carlo applies in this generic setting (as it could considerably improve convergence…)

## Non-reversible Markov Chains for Monte Carlo sampling

Posted in pictures, Statistics, Travel, University life with tags ABC, Alan Turing Institute, CRiSM, Hamiltonian Monte Carlo, intractable likelihood, lifting, Monte Carlo Statistical Methods, non-reversible diffusion, NUTS, overdamped Langevin algorithm, random walk, University of Warwick, workshop on September 24, 2015 by xi'an**T**his “week in Warwick” was not chosen at random as I was aware there is a workshop on non-reversible MCMC going on. (Even though CRiSM sponsored so many workshops in September that almost any week would have worked for the above sentence!) It has always been kind of a mystery to me that non-reversibility could make a massive difference in practice, even though I am quite aware that it does. And I can grasp some of the theoretical arguments why it does. So it was quite rewarding to sit in this Warwick amphitheatre and learn about overdamped Langevin algorithms and other non-reversible diffusions, to see results where convergence times moved from n to √n, and to grasp some of the appeal of lifting albeit in finite state spaces. Plus, the cartoon presentation of Hamiltonian Monte Carlo by Michael Betancourt was a great moment, not only because of the satellite bursting into flames on the screen but also because it gave a very welcome intuition about why reversibility was inefficient and HMC appealing. So I am grateful to my two colleagues, Joris Bierkens and Gareth Roberts, for organising this exciting workshop, with a most profitable scheduling favouring long and few talks. My next visit to Warwick will also coincide with a workshop on intractable likelihood, next November. This time part of the new Alan Turing Institute programme.

## no U-turn sampler [guest slides]

Posted in Statistics with tags Bayes in Paris, detailed balance, Hamiltonian Monte Carlo, NUTS, Riemann manifold, U-turn on April 17, 2013 by xi'an**Y**esterday at the “Bayes in Paris” reading group, my student Marco Banterle presented his analysis of the NUTS paper by Marc Hoffmann and Andrew Gelman I discussed on the ‘Og a while ago. Here are his slides, which could have kept occupied for the whole afternoon, had not Michael started his course one hour later!