Andrew gone NUTS!
Matthew Hoffman and Andrew Gelman have posted a paper on arXiv entitled “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” and developing an improvement on the Hamiltonian Monte Carlo algorithm called NUTS (!). Here is the abstract:
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC’s performance is highly sensitive to two user-specified parameters: a step size ε and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ε on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient “turnkey” sampling algorithms.
Now, my suspicious and pessimistic nature always makes me wary of universality claims! I completely acknowledge the difficulty in picking the number of leapfrog steps L in the Hamiltonian algorithm, since the theory behind does not tell anything useful about L. And the paper is quite convincing in its description of the NUTS algorithm, being moreover beautifully written. As indicated in the paper, the “doubling” solution adopted by NUTS is reminding me of Radford Neal’s procedure in his Annals of Statistics paper on slice sampling. (The NUTS algorithm also relies on a slice sampling step.) However, besides its ability to work as an automatic Hamiltonian methodology, I wonder about the computing time (and the “unacceptably large amount of memory”, p.12) required by the doubling procedure: 2j is growing fast with j! (If my intuition is right, the computing time should increase rather quickly with the dimension. And I do not get the argument within the paper that the costly part is the gradient computation: it seems to me the gradient must be computed for all of the 2j points.) The authors also mention delayed rejection à la Tierney and Mira (1999) and the scheme reminded me a wee of the pinball sampler we devised a while ago with Kerrie Mengersen. Choosing the discretisation step ε is more “traditional”, using the stochastic approximation approach we set in our unpublished-yet-often-quoted tech report with Christophe Andrieu. (I do not think I got the crux of the “dual averaging” for optimal calibration on p.11) The illustration through four benchmarks [incl. a stochastic volatility model!] is quite convincing as well, with (unsurprisingly) great graphical tools. A final grumble: that the code is “only” available in the proprietary language Matlab! Now, I bet some kind of Rao-Blackwellisation is feasible with all the intermediate simulations!