Non-reversible Markov Chains for Monte Carlo sampling
This “week in Warwick” was not chosen at random as I was aware there is a workshop on non-reversible MCMC going on. (Even though CRiSM sponsored so many workshops in September that almost any week would have worked for the above sentence!) It has always been kind of a mystery to me that non-reversibility could make a massive difference in practice, even though I am quite aware that it does. And I can grasp some of the theoretical arguments why it does. So it was quite rewarding to sit in this Warwick amphitheatre and learn about overdamped Langevin algorithms and other non-reversible diffusions, to see results where convergence times moved from n to √n, and to grasp some of the appeal of lifting albeit in finite state spaces. Plus, the cartoon presentation of Hamiltonian Monte Carlo by Michael Betancourt was a great moment, not only because of the satellite bursting into flames on the screen but also because it gave a very welcome intuition about why reversibility was inefficient and HMC appealing. So I am grateful to my two colleagues, Joris Bierkens and Gareth Roberts, for organising this exciting workshop, with a most profitable scheduling favouring long and few talks. My next visit to Warwick will also coincide with a workshop on intractable likelihood, next November. This time part of the new Alan Turing Institute programme.
September 24, 2015 at 1:02 am
Ok. I’ve clearly fallen asleep at the wheel, because I always thought HMC Was a metropolis corrected Hamiltonian flow (and hence reversible). Is it not???
September 24, 2015 at 10:10 am
Dan, you should have been at the workshop where I discussed exactly this point in detail! To summarize, Hamiltonian flow and the modified Hamiltonian flow of a symplectic integrator are both non-reversible hence effective at exploring the target. But if we want to unbias with a Metropolis the samples then we have to augment the flow (say with a momentum flip before and then a momentum resampling after) to make it reversible. The problem is that such an augmentation compromises the performance of the flow — for example if we apply corrections with a high frequency then we devolve into a Langevin diffusion. Only by integrating for long enough can we resolve that tension, and formalizing that idea very naturally motivates algorithms like NUTS.
September 24, 2015 at 11:22 am
Ok. That makes perfect sense.
September 24, 2015 at 10:23 am
I know nothing, but I do not think so.