A Vanilla Rao-Blackwellisation (comments)

One of the authors of “On convergence of importance sampling and other properly weighted samples to the target distribution” by S. Malefaki and G. Iliopoulos, sent me their paper (now published in JSPI, 2008, pp. 1210-1225) to point out the connection with our Vanilla Rao-Blackwellisation paper. There is indeed a link in that those authors also exploit the sequence of accepted values in an MCMC sequence to build up geometric weights based on the distribution of those accepted rv’s. The paper also relates more strongly to the series of papers published by Jun Liu and coauthors in JASA in the early 2000’s about random importance weights, and even more to the birth-and-death jump processes introduced by Brian Ripley in his 1987 simulation book, and studied in Geyer and Møller (1994), Grenander and Miller (1994) and Phillips and Smith (1996) that led to the birth-and-death MCMC approach of Mathew Stephens in his thesis and 2000 Annals paper. As later analysed in our 2003 Series B paper, this jump process approach is theoretically valid but may lead to difficulties in the implementation stage. The first one is that each proposed value is accepted, albeit briefly and thus that, with proposals that have a null recurrent or a transient behaviour, it may take “forever” to go to infinity and back. The second one is that the perspective offered by this representation—which in the case of the standard Metropolis algorithm does not involve any modification—gives a vision of Metropolis algorithms as a rough version of an importance sampling algorithm. While this somehow is also the case for our Vanilla paper, the whole point of using a Metropolis or a Gibbs algorithm is exactly to avoid picking an importance sampling distribution in complex settings because they are almost necessarily inefficient and instead exploit some features of the target to build the proposals. (This is obviously a matter of perspective on the presentation of the analysis in the above paper, nothing’s being wrong with its mathematics.)

3 Responses to “A Vanilla Rao-Blackwellisation (comments)”

  1. […] and in the book Introducing Monte Carlo Methods with R with George Casella,. We also included the reference to the paper by Malefaki and Iliopoulos about Lemma 1, discussed in a previous post. The assessment […]

  2. Arnaud Doucet also pointed out to me that the jump process construction was implicit in the n-fold way algorithm of Bortz, Kalos and Lebowitz, J. Computational physics, 1975, but that earlier remarks on this property may be available.

  3. I read Malefaki & Iliopoulos’ paper when I was just getting in to Monte Carlo methods and found it quite interesting, although I was not inspired to really do anything with it. However, I think that what is particularly interesting from this interpretation of MH is that one constructs a Markov chain with invariant distribution gamma and then weights (through waiting times) each point x using a random importance weight whose expectation 1/p(\xi) is proportional to \pi(\xi)/\gamma(\xi).

    In a discrete setting it is possible to compute 1/p(\xi) exactly giving rise to a lower variance version of the n-fold way in the physics literature (where Bortz et al. pick points exactly but then add noise to the weights). In a continuous setting your paper provides a better unbiased estimate of 1/p(\xi) than the standard geometric waiting time. However, one of the benefits of the n-fold way and it’s exact-weight modification in a discrete state space is that it allows us to spend a fixed amount of time at each point by simply evaluating the acceptance ratio for every point z with q(z|x) \ge 0, irrespective of the value of 1/p(\xi) whereas in the continuous setting there seems to be no way to do this. Indeed, the waiting time in some systems for standard proposal densities is a major obstacle to using MCMC effectively.

    Also, given that we can view MCMC as constructing a Markov chain on \gamma and not \pi but just correcting the samples with importance weights, it suggests that we maybe should pick a different ‘target’ density \psi that is very close to the invariant distribution of the ‘natural’ Markov chain gamma and then use importance weights to correct for the discrepancy between pi and psi. My guess is such an approach, if possible, would be quite useful since we often are constrained to use reasonably fixed proposal densities.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.