vanilla is Russian!

Mark Girolami kindly sent me his Russian Roulette paper on handling intractable normalising constants toward exact-approximation MCMC schemes: “Playing Russian Roulette with Intractable Likelihoods”… I had heard Mark’s talk in Warwick last month but reading the paper (written jointly with Anne-Marie Lyne, Heiko Strathmann, Dan Simpson and Yves Atchadé) made me understand the method(s) much better. The idea is to use a series representation (that they call MacLaurin and that I would call Taylor, but this may be related to MacLaurin being Scottish, a good enough reason!) to get from an approximation of the normalising constant to an unbiased estimation of the inverse of this constant, via either a geometric or an exponential version. The infinite series is then truncated by the Russian roulette, a physics trick that relies on a stopping rule to ensure both unbiasedness and an almost surely finite number of terms.

Now, some wee bit of whining and an explanation for the (apparently insane) post title: in our vanilla Rao-Blackwellisation paper, Randal Douc and I used fairly similar techniques to find an unbiased estimate of an inverse probability to Rao-Blackwellise Metropolis-Hastings outcomes. Namely, a geometric series representation of this inverse probability identical to (7) in the current paper and a truncation of the infinite summation of products using uniforms and indicators as in Russian Roulette… So, although we were unaware of this at the time, we were essenially using Russian Roulette! I am thus a tad disappointed to see our work not referenced in this paper and a wee more disappointed that it got under the radar in an important segment of the community! (Besides this omission, the many refernces to my (other) papers are much appreciated!)

Back to the current paper, a first item of interest is that the geometrically tilted estimator can rely on a biased estimator of the intractable constant, still resulting in an unbiased estimator in the end. (See below about this!) Although this solution requires the availability of upper bounds on the ratio between the estimator and the missing constant, which obviously is a drag, this is a good incentive for the method. (This and the fact that it is always possible to turn it into a positive estimate.) The boundedness also allows for geometric Russian Roulette as in our vanilla Rao-Blackwellisation paper. As an aside, I find the presentation in Section 3.1 a mite too convoluted: using the identity

\dfrac{f(y,\theta)}{\mathcal{Z}(\theta)}=\dfrac{f(y,\theta)}{\tilde{\mathcal{Z}}(\theta)}\times c(\theta)\times\dfrac{\tilde{\mathcal{Z}}(\theta)}{c(\theta)\mathcal{Z}(\theta)}

seems to me to be sufficient to justify the estimation via the geometric series without a call to tilted exponential families. The alternative based on an exponential auxiliary variable seems slightly less appealing to me. First because it requires an unbiased estimator of the intractable constant. Second, I do not see how this auxiliary variable can be generated, given that the rate parameter in the exponential is the intractable constant. Maybe I am missing the points by thousands of miles…. In an email exchange, Mark pointed out that both methods use unbiased estimates of the constant. However, the paper mentions twice that its estimators can start from some approximation of Z(θ) that is not necessarily unbiased. Thus, I wonder whether or not having an infinite sequence of independent estimators with the same distribution as the initial approximation could be enough.

The solution proposed in the paper (to deal with negative density estimates) is quite clever, turning the negatives into positives thanks to the sign function. I however wonder about the overall efficiency of the trick for two (minor?) reasons. The first reason for worry is that, when the density estimate is very negative, the exact value is likely to be close to zero, rather than very large. Turning the estimate into its  positive counterpart puts undue weight on this value. The second reason for worry is that the final estimate, being a ratio of estimates, will be biased. Obviously, this is not damning as this happens at the estimation stage rather than during the MCMC stage…

A mathematical worry about the series convergence proof in the Appendix: it seems to me that the telescopic argument in the proof only applies to convergent series as, otherwise, we could be considering a difference of two infinities… Another worry: I find the explanation at the top of page 5 somehow confusing: unless the notation is generic, the fact that p can be evaluated in

\hat{\pi}(\theta,\xi|y)=\hat{p}(\theta,\xi|y)/Z(y)

does not make sense if the lhs is the estimate found in the previous page.

One Response to “vanilla is Russian!”

  1. […] by Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade, see also Xian’s comments on it. The motivation of the paper is slightly different than the one described above but the techniques […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.