non-reversible MCMC

While visiting Dauphine, Natesh Pillai and Aaron Smith pointed out this interesting paper of Joris Bierkens (Warwick) that had escaped my arXiv watch/monitoring. The paper is about turning Metropolis-Hastings algorithms into non-reversible versions, towards improving mixing.

In a discrete setting, a way to produce a non-reversible move is to mix the proposal kernel Q with its time-reversed version Q’ and use an acceptance probability of the form

\epsilon\pi(y)Q(y,x)+(1-\epsilon)\pi(x)Q(x,y) \big/ \pi(x)Q(x,y)

where ε is any weight. This construction is generalised in the paper to any vorticity (skew-symmetric with zero sum rows) matrix Γ, with the acceptance probability

\epsilon\Gamma(x,y)+\pi(y)Q(y,x)\big/\pi(x)Q(x,y)

where ε is small enough to ensure all numerator values are non-negative. This is a rather annoying assumption in that, except for the special case derived from the time-reversed kernel, it has to be checked over all pairs (x,y). (I first thought it also implied the normalising constant of π but everything can be set in terms of the unormalised version of π, Γ or ε included.) The paper establishes that the new acceptance probability preserves π as its stationary distribution. An alternative construction is to make the proposal change from Q in H such that H(x,y)=Q(x,y)+εΓ(x,y)/π(x). Which seems more pertinent as not changing the proposal cannot improve that much the mixing behaviour of the chain. Still, the move to the non-reversible versions has the noticeable plus of decreasing the asymptotic variance of the Monte Carlo estimate for any integrable function. Any. (Those results are found in the physics literature of the 2000’s.)

The extension to the continuous case is a wee bit more delicate. One needs to find an anti-symmetric vortex function g with zero integral [equivalent to the row sums being zero] such that g(x,y)+π(y)q(y,x)>0 and with same support as π(x)q(x,y) so that the acceptance probability of g(x,y)+π(y)q(y,x)/π(x)q(x,y) leads to π being the stationary distribution. Once again g(x,y)=ε(π(y)q(y,x)-π(x)q(x,y)) is a natural candidate but it is unclear to me why it should work. As the paper only contains one illustration for the discretised Ornstein-Uhlenbeck model, with the above choice of g for a small enough ε (a point I fail to understand since any ε<1 should provide a positive g(x,y)+π(y)q(y,x)), it is also unclear to me that this modification (i) is widely applicable and (ii) is relevant for genuine MCMC settings.

4 Responses to “non-reversible MCMC”

  1. In terms of interpretation of the diffusion with non-reversible drift component, I think this can be generalised from the Gaussian invariant density case by

    dx = [ – (∂E/∂x) dt + √2 dw ] + S’ (∂E/∂x) dt

    where ∂E/∂x is the usual gradient of the negative log (unnormalised) density / energy and S=-S’ is a skew symmetric matrix. In this form it seems the dynamic can be interpreted as the composition of an energy and volume conserving (non-canonical) Hamiltonian dynamic

    dx/dt = S’ ∂E/∂x

    and a (non-preconditioned) Langevin diffusion

    dx = – (∂E/∂x) dt + √2 dw

    As an alternative to discretising the combined dynamic, it might be interesting to compare to sequential alternation between ‘Hamiltonian’ steps either using a simple Euler discretisation

    x’ = x + h S’ ∂E/∂x

    or a symplectic method like implicit midpoint to maintain reversibility / volume preservation and then a standard MALA step

    x’ = x – h (∂E/∂x) + √2 h w, w ~ N(0, I)

    plus MH accept. If only one final MH accept step is done this overall dynamic will be reversible, however if a an intermediary MH accept was done after each Hamiltonian step (flipping the sign / transposing S on a rejection to maintain reversibility), the composed dynamic would in general be non-longer reversible and it would be interesting to compare performance in this case to that using a non-reversible MH acceptance on the combined dynamic (this alternative sidestepping the issues with finding an appropriate scale ε to maintain the non-negativity condition on the sum of the vorticity density and joint density on a proposed and current state).

    With regards to your point on the positivity of g(x,y)+π(y)q(y,x), I’m not sure if I have fully understood your notation correctly or not, but I think you may have misread the definition of g(x,y) for the discretised Ornstein-Uhlenbeck case (apologies if instead the misinterpretation is mine!). The vorticity density is defined as the skew symmetric component of the density f of F(dx, dy) = µ(dx) Q(x, dy) with respect to the Lebesgue measure, where µ(dx) is the true invariant distribution of the Euler-Maruyama discretised diffusion based proposal density Q(x, dy) rather than g(x, y) being defined in terms of the skew-symmetric component of π(dx) Q(x, dy) which in general would lead to a vorticity density which does not meet the zero integral requirement as the target density π is not invariant in general with respect to Q.

    • Dear all, thank you for your interest and remarks on this paper. In Remark 4.4 in the paper I discuss the general case briefly. I agree that it is probably hard work to satisfy the condition (bullet two of the algorithm in Figure 1) of finding a Gaussian density N(0,V) that is ‘dominated’ by the target density. Also, this only seems to be useful if V has different eigenvalues, because non-reversibility then helps in averaging out of the eigenvalues and thus reduce autocorrelation — at least in the Gaussian case.

      Matt is right that one can use a non-reversible Langevin diffusion as a sampler for general target densities, and correct after one or more steps by (NR)MH. Be careful in assuming this non-reversible chain improves convergence properties: It is true that you get a non-reversible chain if you have two reversible transition matrices P_1 and P_2 and multiply those, but as far as I know it is not immediately clear that you get any performance improvement. The way to get guaranteed reduction of asymptotic variance is by constructing a non-reversible chain from a reversible chain by adding some skew-symmetric term.

      More work and discussion is needed I’d say. Here is a link to the workshop website, 21-23 Sept 2015:
      http://www.warwick.ac.uk/nonrevmcmc

  2. Dan Simpson Says:

    Joris is running a workshop on non-reversable MCMC later this year (I think). He also gave a great talk the other day for the CRiSM council.

    My intuition (which may be very very wrong) is that the skew-symmetric vorticity operator is inducing some sort of “momentum flow” on the space of orthogonal transformations that can “push” you across an energy barrier and help explore the space. I wonder if it can be related to Mike B’s thermodynamic stuff and, in particular, if it has the same limitations (in the sense of there being metastabitises and unreachable states)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: