Posterior model probabilities computed from model-specific Gibbs output [arXiv:1012.0073]

“Expressing RJMCMC as simple Gibbs sampling provides the key innovation of our formulation: it allows us to fit models one at a time using ordinary MCMC and then compute model weights or Bayes factors by post-processing the Monte Carlo output.”

Richard Barker (from the University of Otago, Dunedin, New Zealand) and William Link posted this new paper on arXiv. A point in their abstract attracted my attention, namely that they produce a “representation [that] allows [them] to fit models one at a time using ordinary MCMC and then compute model weights or Bayes factors by post-processing the Monte Carlo output”. This is quite interesting in that most attempts at building Bayes factors approximations from separate chains running each on a separate model have led to erroneous solutions. It appears however that the paper builds upon a technique fully exposed in the book written by the authors.

The crux of the representation is a saturation idea also found in earlier RJMCMC papers like the discussion paper by Steve Brooks, Paolo Giudici and Gareth Roberts published in JRSS Series B (2003)—paper that I discussed in Banff the same year—, namely that one could construct a saturated parameter space such that all models would be parameterised in terms of this saturated parameter (with an irrelevant part varying across models). This generalises the 1995 JRSS B paper of Brad Carlin and Sid Chib as well, since they also pile up parameters from all models into a huge vector. As in this earlier paper, Barker and Link introduce pseudo-priors on the irrelevant parts of the parameter (conditional upon a model) which allows them to express the posterior probability of a model given the current value of the parameter

\pi(\mathfrak{M}_k|x,\theta) = \dfrac{\pi(\mathfrak{M}_k)\pi(\theta|\mathfrak{M}_k)f(x|\theta,\mathfrak{M}_k)}{\sum_m\pi(\mathfrak{M}_m)\pi(\theta|\mathfrak{M}_m)f(x|\theta,\mathfrak{M}_m)}

This representation holds for any parameter \theta, meaning that this parameter can be generated from any model (with completion by the irrelevant part simulated from the pseudo-prior). If I interpret correctly the presentation made in the paper (I have not seen the book). the idea is then to run a regular MCMC chain for each model (\theta_k^{(t)}) and then to switch from chain to chain according to the above probability. The advantage  when compared with the original reversible jump MCMC algorithm is that running the Gibbs sampler (or another MCMC sampler) on each model produces values that are likely for this model, so avoids the high rejection rate due to inappropriate proposals at each jump. The drawback is that the method is highly dependent on the pseudo-priors (as in Carlin and Chib). Handling many models or models with large differences in the dimensions of the parameters may thus be problematic.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: