A vanilla Rao-Blackwellisation

When we wrote Rao-Blackwellization of sampling schemes with George Casella, in 1996, which still is one of my favourite papers!, we used a [latent variable] representation of the standard average as a function of the proposed moves

\frac{1}{N} \sum_{t=1}^N \sum_{j=1}^t h(y_j) \mathbb{I}(x_t=y_j)

that could be integrated in the accepted x_t‘s given the proposed y_j‘s. Taking this conditional expectation is always possible, which made the result so exciting!, but also costly, since all possible histories of acceptance must be considered, unfortunately leading to a tree-like complexity of order \text{O}(N^2). Hence a lack of followers, despite a high number of citations (248 recorded by Google Scholar).

In this new version, written with Randal Douc and now posted on arXiv, we found—during this inspiring MCMC meeting at the University of Warwick—a different representation of the Metropolis-Hastings estimators that allows for a conditioning only on the accepted moves

\frac{1}{N} \sum_{j=1}^M h(z_j) \sum_{t=0}^\infty \prod_{m=1}^{t-1}\mathbb{I}\{u_{jm}\ge \alpha(z_j,y_{jm})\}

where the z_j‘s are the accepted y_t‘s, M is the number of accepted y_t‘s, the y_{jt}‘s are simulated from the proposal q(z_j,y) and \alpha(x,y) is the Metropolis-Hastings acceptance probability. While both representations give identical estimators, integrating out the u_{jt}‘s in the above leads to a different and much simpler Rao-Blackwellised version,

\frac{1}{N} \sum_{j=1}^M h(z_j) \sum_{t=0}^\infty \prod_{m=1}^{t-1}\{1- \alpha(z_j,y_{jm})\}

with a computing cost that remains of the same order (and which, further, can be controlled as explained in the paper). While the derivation of this Rao-Blackwellised estimator is straightforward, establishing the (asymptotic) domination of the Rao-Blackwellised version is harder than in the 1996 Biometrika paper, since the conditional expectations are conducted conditional on all z_j‘s, which means the Rao-Blackwellised estimator is a sum of conditional expecttations and this makes the integrated terms correlated which requires much more complex convergence theorems… Note also that, since we condition at an earlier stage, the practical improvement brought by this “vanilla” version is unsurprisingly more limited than in the Biometrika paper. But the nice thing about this paper is that it is basically hassle-free: it applies in every situation the Metropolis-Hastings algorithm is used and only requires to keep track of the proposed moves (plus some more)…

4 Responses to “A vanilla Rao-Blackwellisation”

  1. […] Rao-Blackwellisation updated The vanilla Rao-Blackwellisation paper with Randal Douc has been updated, arXived, and resubmitted to the Annals, to include a more […]

  2. I somehow (I;ve forgotten now) found my way to this post from Andrew Gelman’s blog last month and read this paper, which I really enjoyed. One thing that struck me about the paper was that your figures and the conclusions you draw from them in the text somehow underplays the benefits of the method. The figures focus attention on the fact that at a given fixed number of iterations, the Rao-Blackwellization only tightens up the dispersion of estimate by a small factor. (That is, draw a vertical line on the graph and see how far apart the intersections of the two curves are.) Fair enough. But another way to look at it is, “how many iteration of the non-Rao-Blackwellized chain are needed to give the same dispersion as the Rao-Blackwellized chain?” That is, draw a horizontal line on the graph and see how far apart the intersections of the two curves are. Due to MC error it’s hard to tell exactly, but this could be fixed by averaging over multiple runs. From this viewpoint, as the curves flatten out as the number of iterations approaches 1000, the advantage of the Rao-Blackwellized chain increases.

    • True! Our graphical representation was to say that, for a given number of iterations, using the vanilla RB would decrease the variability/variance/dispersion by “that much” but looking at the first iteration under a given precision would indeed also work for the comparison. In the end, it should be the same in that the number of iterations needed to reach a given precision is proportional to the variance of the process. The variance and not the standard deviation then means the figures will indeed look better for the RB version. Thanks! This will be useful for the revision (soon to come!).

  3. […] Xi’an’s Og An attempt at bloggin, from scratch… « A vanilla Rao-Blackwellisation […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: