Thanks for following this up and for giving it such prominence (more than it probably deserves!).

The first part of my comment was more in response to Dan Simpson’s comment on an intuition ‘that the skew-symmetric vorticity operator is inducing some sort of “momentum flow” on the space of orthogonal transformations that can “push” you across an energy barrier’. That was where my suggestion of thinking of how the continuous time dynamic can be decomposed in to Langevin and Hamiltonian updates might be interesting came from.

Due to the non-canonical nature of the Hamiltonian dynamic component, there are no momentum variables as such (though if we were to make a change of variables we could bring the system in to a canonical form by some linear transform), the Hamiltonian energy exchange happening within the original system. However it is introducing a coherent flow to the system, without the Langevin dynamic component, the trajectories under the dynamic would follow paths along the isopotential surfaces – so rather than the vorticity pushing across energy barriers, to me it seems more like pushing around energy barriers. This is obviously a simplification as the vorticity kernel does not only represent the non-reversibility introduced by the extra drift component introduced by S, but also (I think) the inherent non-reversibility of the Euler discretisation of the Langevin dynamic, but I thought it might be a partially useful intuition as to what might be going on.

@Alex:

Again thanks for the comments on my comment, some interesting points!

My experience with using implicit-midpoint for integrating a Hamiltonian dynamic like this tallies with what you said regarding the need to set the step-size quite small to get consistent convergence (also using fixed-point iterations). I’m not sure about any theoretical guarantees on convergence, but I’ve found in practice that it’s possible to get consistent convergence without too much trouble even if the energy gradients have quite a complex non-linear form, and in this case due to the symplecticness of implicit mid-point you can get high acceptance rates (and so limited sign flipping) even when integrating over large number of steps due to the bounded changes in the Hamiltonian.

Obviously there is a high computational cost for using a small step size and iterative implicit updates so such a dynamic is probably unlikely to be of benefit when simpler methods or conventional HMC works well, but in some cases integrating paths along isopotential surfaces like this seems to help when the energy function is non-convex as standard HMC seems to have trouble finding a way through narrow bottlenecks / ‘valleys’ in the energy surface particularly in high-dimensions. RM-HMC can help with this as well, though again requires implicit updates and evaluating the (inverse) metric and derivatives is generally costly.

]]>While playing a few months ago with more or less the same approach, I have found quite difficult in practice to:

– choose a sufficiently small time step \epsilon for the implicit-midpoint update that ensures that the implicit equation can be solved (I was simply iterating the fixed point equation — btw, I suspect that similar problems can be encountered while implementing the Manifold-HMC of Girolami-Calderhead). Indeed, in general (without strong assumptions on the target distribution) there is no guarantee that such an implicit update has one, and only one, solution. Also, I have observed that for a large-ish value of \epsilon, even if the implicit equation can sometimes be solved, this leads to a large number of rejections (because of the sign flip in these cases) and thus to a lot of random-walk behaviour (which is precisely what one is trying to avoid).

– in practice, it is also quite important to add an update step for the skew-symmetric matrix S so that at equilibrium the pair (x,S) has a well defined invariant distribution. For example, one can do a full update of S every “N” steps. Choosing “N” efficiently is quite tricky.

– indeed, one can mix the implicit-midpoint update with any other MCMC update i.e. not necessarily a MALA update. I was using a simple RWM, and that was not working too badly.

– overall, the above strategy was not doing better than a standard HMC on the examples I was playing with (i.e. not really worth the trouble of the implicit update, etc…) but I am sure, though, that one can cook up examples where this may be worth it…

]]>