## Riemann, Langevin & Hamilton

Posted in Statistics, University life with tags , , , , , on September 27, 2010 by xi'an

In preparation for the Read Paper session next month at the RSS, our research group at CREST has collectively read the Girolami and Calderhead paper on Riemann manifold Langevin and Hamiltonian Monte Carlo methods and I hope we will again produce a joint arXiv preprint out of our comments. (The above picture is reproduced from Radford Neal’s talk at JSM 1999 in Baltimore, talk that I remember attending…) Although this only represents my preliminary/early impression on the paper, I have trouble with the Physics connection. Because it involves continuous time events that are not transcribed directly into the simulation process.

Overall, trying to take advantage of second order properties of the target—just like the Langevin improvement takes advantage of the first order—is a natural idea which, when implementable, can obviously speed up convergence. This is the Langevin part, which may use a fixed metric M or a local metric defining a Riemann manifold, G(θ). So far, so good, assuming the derivation of an observed or expected information G(θ) is feasible up to some approximation level. The Hamiltonian part that confuses me introduces a dynamic on level sets of

$\mathscr{H}(\theta,\mathbf{p})=-\mathcal{L}(\theta)+\frac{1}{2}\log\{(2\pi)^D|\mathbf{G}(\theta)|\}+\frac{1}{2}\mathbf{p}^\text{T}\mathbf{G}(\theta)^{-1}\mathbf{p}$

where p is an auxiliary vector of dimension D. Namely,

$\dot{\mathbf{p}} = \dfrac{\partial \mathscr{H}}{\partial \mathbf{p}}(\theta,\mathbf{p})\,,\qquad\dot{\theta}=\dfrac{\partial \mathscr{H}}{\partial \theta}(\theta,\mathbf{p})\,.$

While I understand the purpose of the auxiliary vector, namely to speed up the exploration of the posterior surface by taking advantage of the additional energy provided by p, I fail to understand why the fact that the discretised (Euler) approximation to Hamilton’s equations is not available in closed form is such an issue…. The fact that the (deterministic?) leapfrog integrator is not exact should not matter since this can be corrected by a Metropolis-Hastings step.

While the logistic example is mostly a toy problem (where importance sampling works extremely well, as shown in our survey with Jean-Michel Marin), the stochastic volatility is more challenging and the fact that the Hamiltonian scheme applies to the missing data (volatility) as well as to the three parameters of the model is quite interesting. I however wonder at the appeal of this involved scheme when considering that the full conditional of the volatility can be simulated exactly

## Read Paper 13/10/10

Posted in Statistics, University life with tags , , , , on August 25, 2010 by xi'an

There will be an RSS Read Paper session on October 13 given by Marc Girolami and B. Calderhead on Riemann manifold Langevin and Hamiltonian Monte Carlo methods that I definitely plan to attend. Here is the abstract:

The paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The methods provide fully automated adaptation mechanisms that circumvent the costly pilot runs that are required to tune proposal densities for Metropolis-Hastings or indeed Hamiltonian Monte Carlo and Metropolis adjusted Langevin algorithms. This allows for highly efficient sampling even in very high dimensions where different scalings may be required for the transient and stationary phases of the Markov chain. The methodology proposed exploits the Riemann geometry of the parameter space of statistical models and thus automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density. The performance of these Riemann manifold Monte Carlo methods is rigorously assessed by performing inference on logistic regression models, log-Gaussian Cox point processes, stochastic volatility models and Bayesian estimation of dynamic systems described by non-linear differential equations. Substantial improvements in the time-normalized effective sample size are reported when compared with alternative sampling approaches. MATLAB code that is available from the authors allows replication of all the results reported.

and as usual (400 word) comments can be submitted without any restriction.

## València 9 snapshot [3]

Posted in Statistics, University life with tags , , , , on June 7, 2010 by xi'an

Today was somehow a low-key day for me in terms of talks as I was preparing a climb in the Benidorm backcountry (thanks to the advice of Alicia Quiròs) and trying to copy routes from the (low oh so low!) debit wireless at the hotel. The session I attended in the morning was on Bayesian non-parametrics, with David Dunson giving a talk on non-parametric classification, a talk whose contents were so dense in information that it felt like three talks rather than one, especially when there was no paper to back it up! Katja Ickstadt modelled graphical dependence structures using non-parametrics but also mixtures of normals across different graph structures, an innovation I found interesting if difficult to interpret. Tom Loredo concluded the session with a broad and exciting picture of the statistical challenges found in spectral astronomy (even though I often struggle to make sense of the frequency data astronomers favour).

The evening talk by Ioanna Manolopoulou was a superbly rendered study on cell dynamics with incredible 3D animations of those cell systems, representing the Langevin diffusion on the force fields in those systems as evolving vector fields. And then I gave my poster on the Savage-Dickey paradox, hence missing all the other posters in this session… The main difficulty in presenting the result was not about the measure-theoretic difficulty, but rather in explaining the Savage-Dickey representation since this was unknown to most passerbys.