Archive for adiabatic Monte Carlo

a conceptual introduction to HMC

Posted in Books, Statistics with tags , , , , , , , on September 5, 2017 by xi'an

“…it has proven a empirical success on an incredibly diverse set of target distributions encountered in applied problems.”

In January this year (!), Michael Betancourt posted on arXiv a detailed introduction to Hamiltonian Monte Carlo that recouped some talks of his I attended. Like the one in Boston two years ago. I have (re)read through this introduction to include an HMC section in my accelerating MCMC review for WIREs (which writing does not accelerate very much…)

“…this formal construction is often out of reach of theoretical and applied statisticians alike.”

With the relevant provision of Michael being a friend and former colleague at Warwick, I appreciate the paper at least as much as I appreciated the highly intuitive approach to HMC in his talks. It is not very mathematical and does not provide theoretical arguments for the defence of one solution versus another, but it (still) provides engaging reasons for using HMC.

“One way to ensure computational inefficiency is to waste computational resources evaluating the target density and relevant functions in regions of parameter space that have negligible contribution to the desired expectation.”

The paper starts by insisting on the probabilistic importance of the typical set, which amounts to a ring for Gaussian-like distributions. Meaning that in high dimensions the mode of the target is not a point that is particularly frequently visited.  I find this notion quite compelling and am at the same time [almost] flabbergasted that I have never heard of it before.

“we will consider only a single parameterization for computing expectations, but we must be careful to ensure that any such computation does not depend on the irrelevant details of that parameterization, such as the particular shape of the probability density function.”

I am not sure I get this sentence. Either it means that an expectation remains invariant under reparameterisation. Or something else and more profound that eludes me. In particular because Michael repeats later (p.25) that the canonical density does not depend on the parameterisation.

“Every choice of kinetic energy and integration time yields a new Hamiltonian transition that will interact differently with a given target distribution (…) when poorly-chosen, however, the performance can suffer dramatically.”

When discussing HMC, Michael tends to get a wee bit overboard with superlatives!, although he eventually points out the need for calibration as in the above quote. The explanation of the HMC move as a combination of uniform moves along isoclines of fixed energy level and of jumps between energy levels does not seem to translate into practical implementations, at least not as explained in the paper.  Simulating directly the energy distribution for a complex target distribution does not seem more feasible than moving up likelihood levels in nested sampling. (Unless I have forgotten something essential about HMC!) Similarly, when discussing symplectic integrators, the paper intuitively conveys the reason these integrators avoid Euler’s difficulties, even though one has to seek elsewhere for rigorous explanations. In the end I cannot but agree with the concluding statement that the geometry of the target distribution holds the key to devising more efficient Monte Carlo methods.

estimating constants [impression soleil levant]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 25, 2016 by xi'an

The CRiSM workshop on estimating constants which took place here in Warwick from April 20 till April 22 was quite enjoyable [says most objectively one of the organisers!], with all speakers present to deliver their talks  (!) and around sixty participants, including 17 posters. It remains a exciting aspect of the field that so many and so different perspectives are available on the “doubly intractable” problem of estimating a normalising constant. Several talks and posters concentrated on Ising models, which always sound a bit artificial to me, but also are perfect testing grounds for approximations to classical algorithms.

On top of [clearly interesting!] talks associated with papers I had already read [and commented here], I had not previously heard about Pierre Jacob’s coupling SMC sequence, which paper is not yet out [no spoiler then!]. Or about Michael Betancourt’s adiabatic Monte Carlo and its connection with the normalising constant. Nicolas Chopin talked about the unnormalised Poisson process I discussed a while ago, with this feature that the normalising constant itself becomes an additional parameter. And that integration can be replaced with (likelihood) maximisation. The approach, which is based on a reference distribution (and an artificial logistic regression à la Geyer), reminded me of bridge sampling. And indirectly of path sampling, esp. when Merrilee Hurn gave us a very cool introduction to power posteriors in the following talk. Also mentioning the controlled thermodynamic integration of Chris Oates and co-authors I discussed a while ago. (Too bad that Chris Oates could not make it to this workshop!) And also pointing out that thermodynamic integration could be a feasible alternative to nested sampling.

Another novel aspect was found in Yves Atchadé’s talk about sparse high-dimension matrices with priors made of mutually exclusive measures and quasi-likelihood approximations. A simplified version of the talk being in having a non-identified non-constrained matrix later projected onto one of those measure supports. While I was aware of his noise-contrastive estimation of normalising constants, I had not previously heard Michael Gutmann give a talk on that approach (linking to Geyer’s 1994 mythical paper!). And I do remain nonplussed at the possibility of including the normalising constant as an additional parameter [in a computational and statistical sense]..! Both Chris Sherlock and Christophe Andrieu talked about novel aspects on pseudo-marginal techniques, Chris on the lack of variance reduction brought by averaging unbiased estimators of the likelihood and Christophe on the case of large datasets, recovering better performances in latent variable models by estimating the ratio rather than taking a ratio of estimators. (With Christophe pointing out that this was an exceptional case when harmonic mean estimators could be considered!)