Hamiltonian tails
“We demonstrate HMC’s sensitivity to these parameters by sampling from a bivariate Gaussian with correlation coefficient 0.99. We consider three settings (ε,L) = {(0.16; 40); (0.16; 50); (0.15; 50)}” Ziyu Wang, Shakir Mohamed, and Nando De Freitas. 2013
In an experiment with my PhD student Changye Wu (who wrote all R codes used below), we looked back at a strange feature in an 2013 ICML paper by Wang, Mohamed, and De Freitas. Namely, a rather poor performance of an Hamiltonian Monte Carlo (leapfrog) algorithm on a two-dimensional strongly correlated Gaussian target, for very specific values of the parameters (ε,L) of the algorithm.
The Gaussian target associated with this sample stands right in the middle of the two clouds, as identified by Wang et al. And the leapfrog integration path for (ε,L)=(0.15,50)
keeps jumping between the two ridges (or tails) , with no stop in the middle. Changing ever so slightly (ε,L) to (ε,L)=(0.16,40) does not modify the path very much
but the HMC output is quite different since the cloud then sits right on top of the target
with no clear explanation except for a sort of periodicity in the leapfrog sequence associated with the velocity generated at the start of the code. Looking at the Hamiltonian values for (ε,L)=(0.15,50)
and for (ε,L)=(0.16,40)
does not help, except to point at a sequence located far in the tails of this Hamiltonian, surprisingly varying when supposed to be constant. At first, we thought the large value of ε was to blame but much smaller values still return poor convergence performances. As below for (ε,L)=(0.01,450)
July 18, 2018 at 3:58 pm
Interesting! The first graph is striking.
I remember trying HMC with leapfrog integrator on the Rosenbrock “banana-shaped” distribution and it worked only after I reduced the step size to about 1/500, even though it’s in 2 dimensions and it’s not really highly correlated.
The recent survey by Bou-Rabee and Sanz-Serna (https://arxiv.org/abs/1711.05337) and the book on “Geometric Numerical Integration” by Hairer, Wanner and Lubich are full of relevant results, and descriptions of other integrators which might perform better than the standard leapfrog integrator.
July 19, 2018 at 7:14 pm
Yes, I was surprised as well, although this was actually pointed out in the original paper by Wang et al. Had to try it myself I presume..
July 18, 2018 at 12:56 am
When sampling from a Gaussian distribution, you need to randomize the trajectory length (stepsize times number of steps) to avoid accidentally producing a near-periodic chain. See, for example, my example of sampling from a 100-dimensional Gaussian at the end of section 3 of my HMC review at http://www.cs.utoronto.ca/~radford/ham-mcmc.abstract.html
July 18, 2018 at 8:29 am
Thank you Radford, I was just surprised at how easily this happens!
July 17, 2018 at 4:50 pm
Radford Neal’s HMC paper discusses orbits, as does Michael Betancourt’s exhaustive HMC paper. It has to do more with integration time (epsilon * L in the blog post’s notation) than with step size once the step size gets to the point where the leapfrog algorithm has reasoanble error bounds.
Matt Hoffman burned an incredible quantity of compute cycles on our clusters to evaluate basic HMC vs. NUTS and a simple summary on some hard problems (250 dimensional normal with high correlation, stochastic volatility time series, hierarchical logistic regression, if I recall). Even optimally tuned HMC was inferior to NUTS in all of our experiments. We believe it’s not only tuning step size and integration time, but also hugely influenced by NUTS’s biasing the selection of the point on the Hamiltonian trajectory to be in the last doubling and hence likely farther away from the starting point. Betancourt found that NUTS lost of a lot of steam when he reworked adaptation but (initially) forgot the biasing of draws along the trajectory.
July 17, 2018 at 9:10 am
[…] leave a comment for the author, please follow the link and comment on their blog: R – Xi’an’s Og. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data […]
July 17, 2018 at 8:18 am
[…] article was first published on R – Xi'an's Og, and kindly contributed to […]