robustified Hamiltonian

Posted in Books, Statistics, University life with tags , , , , , , , , , on April 1, 2022 by xi'an

In Gregynog, last week, Lionel Riou-Durant (Warwick) presented his recent work with Jure Vogrinc on Metropolis Adjusted Langevin Trajectories, which I had also heard in the Séminaire Parisien de Statistique two weeks ago. Starting with a nice exposition of Hamiltonian Monte Carlo, highlighting its drawbacks. This includes the potentially damaging impact of poorly tuning the integration time. Their proposal is to act upon the velocity in the Hamiltonian through Langevin (positive) damping, which also preserves the stationarity.  (And connects with randomised HMC.) One theoretical in the paper is that the Langevin diffusion achieves the fastest mixing rate among randomised HMCs. From a practical perspective, there exists a version of the leapfrog integrator that adapts to this setting and can be implemented as a Metropolis adjustment. (Hence the MALT connection.) An interesting feature is that the process as such is ergodic, which avoids renewal steps (and U-turns). (There are still calibration parameters to adjust, obviously.)

general perspective on the Metropolis–Hastings kernel

Posted in Books, Statistics with tags , , , , , , , , , , , , , on January 14, 2021 by xi'an

[My Bristol friends and co-authors] Christophe Andrieu, and Anthony Lee, along with Sam Livingstone arXived a massive paper on 01 January on the Metropolis-Hastings kernel.

“Our aim is to develop a framework making establishing correctness of complex Markov chain Monte Carlo kernels a purely mechanical or algebraic exercise, while making communication of ideas simpler and unambiguous by allowing a stronger focus on essential features (…) This framework can also be used to validate kernels that do not satisfy detailed balance, i.e. which are not reversible, but a modified version thereof.”

A central notion in this highly general framework is, extending Tierney (1998), to see an MCMC kernel as a triplet involving a probability measure μ (on an extended space), an involution transform φ generalising the proposal step (i.e. þ²=id), and an associated acceptance probability ð. Then μ-reversibility occurs for

$\eth(\xi)\mu(\text{d}\xi)= \eth(\phi(\xi))\mu^{\phi}(\text{d}\xi)$

with the rhs involving the push-forward measure induced by μ and φ. And furthermore there is always a choice of an acceptance probability ð ensuring for this equality to happen. Interestingly, the new framework allows for mostly seamless handling of more complex versions of MCMC such as reversible jump and parallel tempering. But also non-reversible kernels, incl. for instance delayed rejection. And HMC, incl. NUTS. And pseudo-marginal, multiple-try, PDMPs, &c., &c. it is remarkable to see such a general theory emerging a this (late?) stage of the evolution of the field (and I will need more time and attention to understand its consequences).

non-reversible gerrymandering

Posted in Books, Statistics, Travel, University life with tags , , , , , , , on September 3, 2020 by xi'an

Gregory Herschlag, Jonathan C. Mattingly [whom I met in Oaxaca and who acknowledges helpful conversations with Manon Michel while at CIRM two years ago], Matthias Sachs, and Evan Wyse just posted an arXiv paper using non-reversible MCMC methods to improve sampling of voting district plans towards fighting (partisan) Gerrymandering. In doing so we extend thecurrent framework for construction of non-reversible Markov chains on discrete samplingspaces by considering a generalization of skew detailed balance. Since this means sampling in a discrete space, the method using lifting. Meaning adding a dichotomous dummy variable, “based on a notion of flowing the center of mass of districts along a defined vector field”. The paper is quite detailed about the validation and the implementation of the method. With this interesting illustration for the mixing properties of the different versions:

non-reversible guided Metropolis–Hastings

Posted in Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , , , on June 4, 2020 by xi'an

Kengo Kamatani and Xiaolin Song, whom I visited in Osaka last summer in what seems like another reality!, just arXived another paper on a non-reversible Metropolis version. That exploits a group action and the associated Haar measure.

Following a proposal of Gustafson (1998), a ∆-guided Metropolis–Hastings kernel is based on a statistic ∆ that is totally ordered and determine the acceptance of a proposed value y~Q(x,.) by adding a direction (-,+) to the state space and moving from x if ∆x≤∆y in the positive direction and if ∆y≤∆x in the negative direction [with the standard Metropolis–Hastings acceptance probability]. The sign of the direction switches in case of a rejection. And the statistic ∆ is such that the proposal kernel Q(x,.) is unbiased, i.e., agnostic to the sign, i.e., it gives the same probability to ∆x≤∆y and ∆y≤∆x. This modification reduces the asymptotic variance compared with the original Metropolis–Hastings kernel.

To construct a random walk proposal that is unbiased, the authors assume that the ∆ transform takes values in a topological group, G, with Q further being invariant under the group actions. This can be constructed from a standard proposal by averaging the transforms of Q under all elements of the group over the associated right Haar measure. (Which I thought implied that the group is compact, except I forgot to account for the data update into a posterior..!) The worked-out example is based on a multivariate autoregressive kernel with ∆x being a rescaled non-central chi-squared variate. In dimension 24. The results show a clear improvement in effective sample size per second evaluation over off-the-shelf random walk and Hamiltonian Monte Carlo versions.

Seeing the Haar measure appearing in the setting of Markov chain Monte Carlo is fun!, as my last brush with it was not algorithmic. I would think the proposal only applies to settings where the components of the simulated vector are somewhat homogeneous in that the determinationthe determination of both the group action and a guiding statistic seem harder in cases where these components take different meaning (or live in a weird topology). I also lazily wonder if selecting the guiding statistic as a gradient of the log-target would have any interest.