## parallelising MCMC algorithms

This paper, A general construction for parallelizing Metropolis-Hastings algorithms, written by Ben Calderhead, was first presented at MCMSki last January and has now appeared in PNAS. It is somewhat related to the recycling idea of Tjelmeland (2004, unpublished) and hence to our 1996 Rao-Blackwellisation paper with George. Although there is no recycling herein.

At each iteration of Ben’s algorithm, N proposed values are generated conditional on the “current” value of the Markov chain, which actually consists of (N+1) components and from which one component is drawn at random to serve as a seed for the next proposal distribution and the simulation of N other values. In short, this is a data-augmentation scheme with the index I on the one side and the N modified components on the other side. The neat trick in the proposal [and the reason for the jump in efficiency] is that the stationary distribution of the auxiliary variable can be determined and hence used (N+1) times in updating the vector of (N+1) components. (Note that picking the index at random means computing *all* (N+1) possible transitions from one component to the N others. Or even all (N+1)! if the proposals differ. Hence a potential *increase* in the computing cost, even though what costs the most is usually the likelihood computation, dispatched on the parallel processors.) While there are (N+1) terms involved at each step, the genuine Markov chain is truly over a *single* chain and the N other proposed values are not recycled. Even though they could be [for Monte Carlo integration purposes], as shown e.g. in our paper with Pierre Jacob and Murray Smith. Something that took a few iterations for me to understand is why Ben rephrases the original Metropolis-Hastings algorithm as a finite state space Markov chain on the set of indices {1,…,N+1} (Proposition 1). Conditionally on the values of the (N+1) vector, the stationary of that sub-chain is no longer uniform. Hence, picking (N+1) indices from the stationary helps in selecting the most appropriate images, which explains why the rejection rate decreases.

The paper indeed evaluates the impact of increasing the number of proposals in terms of effective sample size (ESS), acceptance rate, and mean squared jump distance, based two examples. As often in parallel implementations, the paper suggests an “N-fold increase in computational speed” even though this is simply the effect of running the *same* algorithm on a single processor and on N parallel processors. If the comparison is between a single proposal Metropolis-Hastings algorithm on a single processor and an N-fold proposal on N processors, I would say the latter is *slower* because of the selection of the index I that forces all pairs of reverse move. Nonetheless, since this is an almost free bonus resulting from using N processors, when compared with more complex coupled chains, it sounds worth investigating and comparing with those more complex parallel schemes.

March 28, 2016 at 6:30 pm

Related to your last comment, I’ve implemented Ben’s algorithm, and even with fairly naive inter-process communication am getting about 0.8*N the speed of a single core, on multi-core machines with anything between 2-16 cores. I’m optimizing this now so I can scale up to multi-machine clusters.

December 23, 2014 at 11:14 am

typos in your links!

December 23, 2014 at 3:52 pm

Merci!!!