## MCMC with errors

I received this email last week from Ian Langmore, a postdoc in Columbia:

I’m looking for literature on a subject and can’t find it: I have a Metropolis sampler where the acceptance probability is evaluated with some error. This error is not simply error in evaluation of the target density. It occurs due to the method by which we approximate the acceptance probability.

**T**his is a sensible question, albeit a wee vague… The closest item of work I can think of is the recent paper by Christophe Andrieu and Gareth Roberts, in the Annals of Statistics (2009) following an original proposal by Marc Beaumont. I think there is an early 1990’s paper by Gareth and Jeff Rosenthal where they consider the impact of some approximation effect like real number representation on the convergence but I cannot find it. Of course, the recent particle MCMC JRSS B discussion paper by Christophe, Arnaud Doucet and Roman Hollenstein is a way to bypass the problem. (In a sense ABC is a rudimentary answer as well.) And there must be many other papers on this topic I am not aware of….

March 28, 2011 at 5:02 pm

Thanks everyone for your interest. The paper by Rosenthal was interesting but addressed a different issue. I should be more specific:

In my case, evaluation of my likelihood involves solving a PDE by a completely different Markov Chain. Therefore, my acceptance probability comes with a very special error: Instead of getting the true α, I get α + N, where N is Gaussian. This gives me a confidence interval around my estimate of α. Note that I can make Variance(N) arbitrarily small (with computational effort). So, rather than being content with an initial estimate of α, I reduce Variance(N) until this confidence interval is small enough.

So what I’m looking for (doubtful that it’s out there) is work where acceptance is done with some confidence…

The case I deal with is rather special, but there is a great deal of work done (in the “inverse problems” community) where evaluation of the likelihood involves solving a PDE, and is therefore done with some error.

March 28, 2011 at 9:45 pm

This seems to relate to Kennedy and Kull (1985) in that you have essentially an unbiased estimator of the acceptance probability, Gaussianity being irrelevant there….

March 28, 2011 at 3:50 am

I don’t believe there’s any approximation in the acceptance probability of an ABC-MCMC sampler if you define the sampler on the joint distribution of parameter and summary statistic (see e.g. my book chapter at http://arxiv.org/pdf/1001.2058).

March 28, 2011 at 7:30 am

The approximation of ABC (Approximative Bayesian Computation) is in the target, not in the acceptance probability or the importance weight. As stressed by Wikinson (2008) and Fearnhead and Prangle (2010), ABC is an exact simulation method wrt to an approximate target distribution. It remains that it brings an approximation effect into the Bayesian inference that is hard to evaluate.

March 28, 2011 at 12:21 am

[...] MCMC with errors [...]

March 25, 2011 at 1:57 pm

There is a series of mostly old papers in the physics literature on this, such as the following

Kennedy, A. D. and Kulti, J. (1985) “Noise without noise: A new Monte Carlo Method”, Physical Review Letters, vol. 54, pp. 2473-2476.

Bhanot, G. and Kennedy, A. D. (1985) “Bosonic lattice gauge theory with noise”, Physics Letters~B, vol. 157, pp. 70-76.

March 25, 2011 at 2:35 pm

Thank you, Radford. I will take a look at those. Before even looking at them, let me ask how comes they did not have more of an impact in the field?

March 25, 2011 at 7:59 pm

Well, it’s hard…

March 25, 2011 at 8:23 pm

Indeed, it is… Provided you care about the quality of the approximation.

I took a look at Kennedy & Kull (1985) as I could not get hold of the second paper and it appears like a primitive version of Andrieu & Roberts (2009) in that the essential input of the paper is the idea to replace the acceptance probability with an unbiased estimator of the acceptance probability (their formula (6)), which is hard as you wrote, because it mixes unbiasedness with boundary constraints, since those probability estimates must remain between 0 and 1, something that sounds impossible if the density ratio is not bounded away from zero and infinity…

March 25, 2011 at 1:44 pm

Actually INLA doesn’t solve it either. The bottleneck is the linear algebra – INLA needs determinants and diagonals of inverses, which are very difficult to compute if the problem is too large to do a Cholesky factorisation (actually, typical numerical linear algebra courses teach that you will *never* need to compute these things). It’s actually the same problem MCMC meets!

March 25, 2011 at 1:57 pm

Ok, you certainly know much better about INLA. My restatement is then that INLA would not solve it much better! I would think a “sparsification” of the matrix leading to independent blocks could provide a feasible approximation running INLA… Evaluating the impact of this sparsification is obviously complicated but if this is the only solution….

March 25, 2011 at 4:28 pm

What you described is really close to ‘domain decomposition methods’, where you split the spatial domain into chunks. If you do it correctly, the interaction terms only appear in one block and the art is then to approximate this block correctly. It’s usually used as a parallel preconditioner for CG-like methods for large iterative systems (although it’s also pretty much how you distribute a Cholesky factorisation).

March 25, 2011 at 1:41 pm

Note that in the Andrieu and Roberts paper the errors must be unbiased – otherwise the invariant distribution will not have the correct marginal density. I guess whether this will be a problem or not depends upon exactly how the acceptance probability is being approximated.

March 25, 2011 at 1:54 pm

This is a good point, thanks. However, there must be constraints on the type of approximation one is using for results to be found. The very nice thing about this paper is that the constraint is a fairly natural one. (Of course, one could think easily of cases where unbiasedness does not hold, as in self-normalised importance sampling.) In contrast, when one uses ABC, there is no definitive result on the quality (or lack of) of the approximation for a given number of particles…

March 25, 2011 at 11:15 am

It surprises me that there isn’t more done on this. Beyond the obvious rounding error/floating point problems, it doesn’t take a lot of effort to find a problem where you just can’t evaluate anything exactly (proposals, acceptance probabilities, the likelihood…).

Take a vanilla geostatistics model with 1e8 points in the spatial field (not an unreasonable number if you want fine scale behaviour in three dimensions over a small–medium sized area [I'm thinking of a costal aquifer]). It is impossible [if you need to do it >10000 times, otherwise very very difficult at best] to calculate the determinants of the precision/covariance matrix exactly, but you can get ok approximation of their ratios (a few decimal places) fairly easily. [Obviously this can be done exactly modulo integration error with a Laplace approximation, but that's not the point]

March 25, 2011 at 1:17 pm

Thanks for those comments, Dan. Indeed, I think INLA is the right solution to this difficulty as, otherwise, there is no formalised way to connect the exact MCMC solution with one that relies on numerical approximations (ie to calibrate the later in terms of the former)…