a conceptual introduction to HMC

Posted in Books, Statistics with tags , , , , , , , on September 5, 2017 by xi'an

“…it has proven a empirical success on an incredibly diverse set of target distributions encountered in applied problems.”

In January this year (!), Michael Betancourt posted on arXiv a detailed introduction to Hamiltonian Monte Carlo that recouped some talks of his I attended. Like the one in Boston two years ago. I have (re)read through this introduction to include an HMC section in my accelerating MCMC review for WIREs (which writing does not accelerate very much…)

“…this formal construction is often out of reach of theoretical and applied statisticians alike.”

With the relevant provision of Michael being a friend and former colleague at Warwick, I appreciate the paper at least as much as I appreciated the highly intuitive approach to HMC in his talks. It is not very mathematical and does not provide theoretical arguments for the defence of one solution versus another, but it (still) provides engaging reasons for using HMC.

“One way to ensure computational inefficiency is to waste computational resources evaluating the target density and relevant functions in regions of parameter space that have negligible contribution to the desired expectation.”

The paper starts by insisting on the probabilistic importance of the typical set, which amounts to a ring for Gaussian-like distributions. Meaning that in high dimensions the mode of the target is not a point that is particularly frequently visited.  I find this notion quite compelling and am at the same time [almost] flabbergasted that I have never heard of it before.

“we will consider only a single parameterization for computing expectations, but we must be careful to ensure that any such computation does not depend on the irrelevant details of that parameterization, such as the particular shape of the probability density function.”

I am not sure I get this sentence. Either it means that an expectation remains invariant under reparameterisation. Or something else and more profound that eludes me. In particular because Michael repeats later (p.25) that the canonical density does not depend on the parameterisation.

“Every choice of kinetic energy and integration time yields a new Hamiltonian transition that will interact differently with a given target distribution (…) when poorly-chosen, however, the performance can suffer dramatically.”

When discussing HMC, Michael tends to get a wee bit overboard with superlatives!, although he eventually points out the need for calibration as in the above quote. The explanation of the HMC move as a combination of uniform moves along isoclines of fixed energy level and of jumps between energy levels does not seem to translate into practical implementations, at least not as explained in the paper.  Simulating directly the energy distribution for a complex target distribution does not seem more feasible than moving up likelihood levels in nested sampling. (Unless I have forgotten something essential about HMC!) Similarly, when discussing symplectic integrators, the paper intuitively conveys the reason these integrators avoid Euler’s difficulties, even though one has to seek elsewhere for rigorous explanations. In the end I cannot but agree with the concluding statement that the geometry of the target distribution holds the key to devising more efficient Monte Carlo methods.

thermodynamic Monte Carlo

Posted in Books, Statistics, University life with tags , , , , on June 27, 2014 by xi'an

Michael Betancourt, my colleague from Warwick, arXived a month ago a paper about a differential geometry approach to relaxation. (In the Monte Carlo rather than the siesta sense of the term relaxation!) He is considering the best way to link a simple base measure ϖ to a measure of interest π by the sequence

$\pi_\beta(x) = \dfrac{e^{-\beta\Delta V(x)}\varpi(x)}{Z(\beta)}$

where Z(β) is the normalising constant (or partition function in the  thermodynamic translation). Most methods are highly dependent on how the sequence of β’s is chosen. A first nice result (for me) is that the Kullback-Leibler distance and the partition function are strongly related in that

$K(\pi_\beta,\pi_\eta) \approx (\eta-\beta) \dfrac{\text{d}Z}{\text{d}\beta}$

which means that the variation in the normalising constant is driving the variation in the Kullback-Leibler distance. The next section goes into differential geometry and the remains from my Master course in differential geometry alas are much too scattered for me to even remember some notions like that of a bundle… So, like Andrew, I have trouble making sense of the resulting algorithm, which updates the temperature β along with the position and speed. (It sounds like an extra and corresponding energy term is added to the original Hamiltonian function.) Even the Beta-Binomial

$k|p\sim\mathrm{B}(n,p)\,,\ p\sim\mathrm{Be}(a,b)$

example is somewhat too involved for me.  So I tried to write down the algorithm step by step in this special case. Which led to

1. update β into β-εδp’²
2. update p into p-εδp’
3. update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
4. compute the average log-likelihood, λ* under the tempered version of the target (at temperature β)
5. update p’ into p’+2εβ{(1-a)/p+(b-1)/(1-p)}-ε[λ-λ*]p’
6. update p’ into p’+ε{(1-a)/p+(b-1)/(1-p)}
7. update β into β-εδp’²
8. update p into p-εδp’

where p’ denotes the momentum auxiliary variable associated with the kinetic energy. And λ is the current log-likelihood. (The parameter ε was equal to 0.005 and I could not find the value of δ.) The only costly step in the above list is the approximation of the log-likelihood average λ*. The above details make the algorithm quite clear but I am still missing the intuition behind…

geometry of Hamiltonian Monte Carlo

Posted in Statistics with tags , , , , , , , , , on February 6, 2012 by xi'an

The paper by Michael Betancourt and Leo Stein on the geometry of Hamiltonian Monte Carlo was one of the remaining ones on my pile of unread arXiv postings. I read it in the Paris métro yesterday (on my way to Jon Wellner’s point de vue), which is presumably the wrong place to learn about symplectic and differential geometry! In any case, I found the paper hard to relate to the implementation of Hamiltonian MCMC methods as illustrated by the Read Paper of Girolami and Calderhead (2011). Even though some notions looked familiar (e.g., “Any subsequent bias, however, can be avoided by treating the flow as a Metropolis proposal function“, and “while any reversible Hamiltonian defines both a probability distribution and respective Markov chain, practical applications demand the reverse construction“), it felt like there was a big gap between the representation adopted therein (“we have constructed the most general approach to Hamiltonian Monte Carlo“) and a potential implementation. Again, presumably wrong place and wrong time to try to read an abstract paper! Especially for a reader who never had a strongly intuitive connection with Physics.