## Hamiltonian ABC

**O**n Monday, Ed Meeds, Robert Leenders, and Max Welling (from Amsterdam) arXived a paper entitled Hamiltonian ABC. Before looking at the paper in any detail, I got puzzled by this association of antagonistic terms, since ABC is intended for complex and mostly intractable likelihoods, while Hamiltonian Monte Carlo requires a lot from the target, in order to compute gradients and Hessians… *[Warning: some graphs on pages 13-14 may be harmful to your printer!]*

Somewhat obviously (ex-post!), the paper suggests to use Hamiltonian dynamics on ABC approximations of the likelihood. They compare a Gaussian kernel version

with the synthetic Gaussian likelihood version of Wood (2010)

where both mean and variance are estimated from the simulated data. If ε is taken as an external quantity and driven to zero, the second approach is much more stable. But… ε is never driven to zero in ABC, or fixed at ε=0.37: It is instead considered as a kernel bandwidth and hence estimated from the simulated data. Hence ε is commensurable with σ(θ). And this makes me wonder at the relevance of the conclusion that synthetic is better than kernel for Hamiltonian ABC. More globally, I wonder at the relevance of better simulating from a still approximate target when the true goal is to better approximate the genuine posterior.

Some of the paper covers separate issues like handling gradient by finite differences à la Spall *[if you can afford it!]* and incorporating the random generator as part of the Markov chain. And using S *common* random numbers in computing the gradients for all values of θ. (Although I am not certain all random generators can be represented as a deterministic transform of a parameter θ and of a fixed number of random uniforms. But the authors may consider a random number of random uniforms when they represent their random generators as deterministic transform of a parameter θ and of the random seed. I am also uncertain about the distinction between common, sticky, and persistent random numbers!)

March 21, 2015 at 9:31 am

Thanks for reading and posting about our paper. We have just a few comments.

Our main target is high-dimensional problems where it is indeed worth the computation to compute the gradient (a la Spall, but not finite difference, Spall’s 2SPSA algorithm) and use stochastic gradient versions of HMC (and btw, where we do not need to compute the Hessian). The goal is to get an indication of the direction of the gradient in high dimensions, with a small number of simulations.

We agree that epsilon should be estimated appropriately for each problem, perhaps as part of the Markov chain. For the purposes of comparing k-eps and SL gradient estimates, we chose to keep it fixed at a small fraction of the simulator noise at the MAP theta, hence 0.37. At this value the posterior approximation is good for ABC-MCMC and it mixes well. The point of this exercise was to point out that gradients from k-eps have large variance for reasonable epsilon even when we add more simulations. This is important for hamiltonian-ABC since large variances force us to turn down the step-size, reducing the benefit of using HMC in the first place. We agree that using a kernel density model with epsilon as the bandwidth may give us the desired unbiased and low-variance gradient estimates.

As for the random seeds, common, persistent or otherwise, we agree that this requires more clarification in the paper and also more investigation to determine the limitations, if any, of controlling the random generator of a black-box simulator. We do not know for certain if we can represent *all* random generators as deterministic functions of a fixed number of random uniforms. You are probably right thinking of it as a random number of random uniforms.

Thanks again for posting and apologies to your printer!