Asymptotically Exact, Embarrassingly Parallel MCMC

Departamento de Matemática, Universidad Complutense de Madrid, 11/11/11Willie Neiswanger, Chong Wang, and Eric Xing (from CMU) recently arXived a paper entitled as above. The “embarrassing” in the title refers to the “embarrassingly simple” solution proposed therein, namely to solve the difficulty in handling very large datasets by running completely independent parallel MCMC samplers on parallel threads or computers and using the outcomes of those samplers as density estimates, pulled together as a product towards an approximation of the true posterior density. In other words, the idea is to break the posterior as

p(\theta|x) = \prod_{i=1}^m p_i(\theta|x)

and to use the estimate

\hat p(\theta|x) = \prod_{i=1}^m \hat p_i(\theta|x)

where the individual estimates are obtained by, say, non-parametric estimates. The method is then “asymptotically exact” in the weak (and unsurprising) sense of being converging in the number of MCMC iterations. Still, there is a theoretical justification that is not found in previous parallel methods that mixed all resulting samples without accounting for the subsampling. And I also appreciate the point that, in many cases, running MCMC samplers with subsamples produces faster convergence.

In the paper, the division of p into its components is done by partitioning the iid data into m subsets. And taking a power 1/m of the prior in each case. (Which may induce improperness issues.) However, the subdivision is arbitrary and can thus be implemented in other cases than the fairly restrictive iid setting. Because each (subsample)  non-parametric estimate involves T terms, the resulting overall estimate contains Tm terms and the authors suggest using an independent Metropolis-within-Gibbs sampler to handle this complexity. Which is necessary [took me a while to realise this!] for producing a final sample from the (approximate) true posterior distribution. As an aside, I wonder why the bandwidths are all equal across all subsamples, as they should depend on those. And as it would not make much of a difference. It would also be interesting to build a typology of cases where subsampling leads to subposteriors that are close to orthogonal, preventing the implementation of the method.

As it happened, I read this paper on the very day Nial Friel (University College Dublin) gave a seminar at the Big’MC seminar on the convergence of approximations to ergodic transition kernels, based on the recent results of Mitrophanov on the stability of Markov chains, where he introduced the topic by the case of datasets large enough to prevent the computation of the likelihood function.

5 Responses to “Asymptotically Exact, Embarrassingly Parallel MCMC”

  1. […] blog.) And in the afternoon session, Sylvia Frühwirth-Schnatter exposed her approach to the (embarrassingly) parallel problem, in the spirit of Steve’s , David Dunson’s and Scott’s (a paper posted on […]

  2. […] and images. (Check the video at 45:00.) Then Steve Scott gave us a Google motivated entry to embarrassingly parallel algorithms, along the lines of papers recently discussed on the ‘Og. (Too bad we forgot to start the […]

  3. Dan Simpson Says:

    I wonder if this could be combined with ABC-like ideas. Splitting the data is all well and good when it’s informative (which is, in some sense, the less interesting big data Bayes case), but when it’s not, it may be better to do a more careful decomposition step. (The question, I guess, is what sort of models would let you do this legally…)

    • True, there is nothing that prevents you from applying this idea to ABC, assuming the product decomposition leads to a proper “subposterior” distribution for each term. But you could also get closer to ABC by (a) create a reference table of prior simulations for all threads; (b) create a reference table of pseudo-sub-samples on each thread; (c) compute the respective part of the distance between sample and pseudo-sample on each thread (and throw away the reference table overboard); (d) get back together all thread distance components and select the ABC subsample.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: