## Hamiltonian tails

“We demonstrate HMC’s sensitivity to these parameters by sampling from a bivariate Gaussian with correlation coefficient 0.99. We consider three settings (ε,L) = {(0.16; 40); (0.16; 50); (0.15; 50)}” Ziyu Wang, Shakir Mohamed, and Nando De Freitas. 2013

In an experiment with my PhD student Changye Wu (who wrote all R codes used below), we looked back at a strange feature in an 2013 ICML paper by Wang, Mohamed, and De Freitas. Namely, a rather poor performance of an Hamiltonian Monte Carlo (leapfrog) algorithm on a two-dimensional strongly correlated Gaussian target, for very specific values of the parameters (ε,L) of the algorithm.

The Gaussian target associated with this sample stands right in the middle of the two clouds, as identified by Wang et al. And the leapfrog integration path for (ε,L)=(0.15,50)

keeps jumping between the two ridges (or tails) , with no stop in the middle. Changing ever so slightly (ε,L) to (ε,L)=(0.16,40) does not modify the path very much

but the HMC output is quite different since the cloud then sits right on top of the target

with no clear explanation except for a sort of periodicity in the leapfrog sequence associated with the velocity generated at the start of the code. Looking at the Hamiltonian values for (ε,L)=(0.15,50)

and for (ε,L)=(0.16,40)

does not help, except to point at a sequence located far in the tails of this Hamiltonian, surprisingly varying when supposed to be constant. At first, we thought the large value of ε was to blame but much smaller values still return poor convergence performances. As below for (ε,L)=(0.01,450)

### 7 Responses to “Hamiltonian tails”

1. Pierre Jacob Says:

Interesting! The first graph is striking.

I remember trying HMC with leapfrog integrator on the Rosenbrock “banana-shaped” distribution and it worked only after I reduced the step size to about 1/500, even though it’s in 2 dimensions and it’s not really highly correlated.

The recent survey by Bou-Rabee and Sanz-Serna (https://arxiv.org/abs/1711.05337) and the book on “Geometric Numerical Integration” by Hairer, Wanner and Lubich are full of relevant results, and descriptions of other integrators which might perform better than the standard leapfrog integrator.

• Yes, I was surprised as well, although this was actually pointed out in the original paper by Wang et al. Had to try it myself I presume..

When sampling from a Gaussian distribution, you need to randomize the trajectory length (stepsize times number of steps) to avoid accidentally producing a near-periodic chain. See, for example, my example of sampling from a 100-dimensional Gaussian at the end of section 3 of my HMC review at http://www.cs.utoronto.ca/~radford/ham-mcmc.abstract.html

• Thank you Radford, I was just surprised at how easily this happens!

3. Radford Neal’s HMC paper discusses orbits, as does Michael Betancourt’s exhaustive HMC paper. It has to do more with integration time (epsilon * L in the blog post’s notation) than with step size once the step size gets to the point where the leapfrog algorithm has reasoanble error bounds.

Matt Hoffman burned an incredible quantity of compute cycles on our clusters to evaluate basic HMC vs. NUTS and a simple summary on some hard problems (250 dimensional normal with high correlation, stochastic volatility time series, hierarchical logistic regression, if I recall). Even optimally tuned HMC was inferior to NUTS in all of our experiments. We believe it’s not only tuning step size and integration time, but also hugely influenced by NUTS’s biasing the selection of the point on the Hamiltonian trajectory to be in the last doubling and hence likely farther away from the starting point. Betancourt found that NUTS lost of a lot of steam when he reworked adaptation but (initially) forgot the biasing of draws along the trajectory.