## unbiased MCMC

**T**wo weeks ago, Pierre Jacob, John O’Leary, and Yves F. Atchadé arXived a paper on unbiased MCMC with coupling. Associating MCMC with unbiasedness is rather challenging since MCMC are rarely producing simulations from the exact target, unless specific tools like renewal can be produced in an efficient manner. (I supported the use of such renewal techniques as early as 1995, but later experiments led me to think renewal control was too rare an occurrence to consider it as a generic convergence assessment method.)

This new paper makes me think I had given up too easily! Here the central idea is coupling of two (MCMC) chains, associated with the debiasing formula used by Glynn and Rhee (2014) and already discussed here. Having the coupled chains meet at some time with probability one implies that the debiasing formula does not need a (random) stopping time. The coupling time is sufficient. Furthermore, several estimators can be derived from the same coupled Markov chain simulations, obtained by starting the averaging at a later time than the first iteration. The average of these (unbiased) averages results into a weighted estimate that weights more the later differences. Although coupling is also at the basis of perfect simulation methods, the analogy between this debiasing technique and perfect sampling is hard to fathom, since the coupling of two chains is not a perfect sampling instant. (Something obvious only in retrospect for me is that the variance of the resulting unbiased estimator is at best the variance of the original MCMC estimator.)

When discussing the implementation of coupling in Metropolis and Gibbs settings, the authors give a simple optimal coupling algorithm I was not aware of. Which is a form of accept-reject also found in perfect sampling I believe. (Renewal based on small sets makes an appearance on page 11.) I did not fully understood the way two random walk Metropolis steps are coupled, in that the normal proposals seem at odds with the boundedness constraints. But coupling is clearly working in this setting, while renewal does not. In toy examples like the (Efron and Morris!) baseball data and the (Gelfand and Smith!) pump failure data, the parameters k and m of the algorithm can be optimised against the variance of the averaged averages. And this approach comes highly useful in the case of the cut distribution, a problem which I became aware of during MCMskiii and on which we are currently working with Pierre and others.

August 28, 2017 at 7:16 am

This is really very very neat. I first thought that such an approach might not scale with the dimension but the empirical results of Section 5.3 and Figure 4 are pretty impressive.

August 25, 2017 at 6:46 pm

Hi,

Thanks for the advertisement. In the next version, we will make sure to be clearer on the coupling of random walk MH. It is actually very simple!

Consider two MH chains at states X_t and Y_t (forget the time shift of the paper for now), each with Normal random walk proposal. So, normally you would draw X* from N(X_t, Sigma), and Y* from N(Y_t, Sigma), with Sigma being the proposal variance, and then decide to accept or not these as the next states of the chains.

What we propose to do, is to sample (X*,Y*) from the maximal coupling of the two distributions N(X_t, Sigma), and N(Y_t, Sigma). The procedure of Algorithm 4 does that exactly, and this was presumably known forever in the coupling literature (it is definitely in Thorisson’s book).

Then there is a chance that X* would be exactly equal to Y*. In that case, it is possible that both proposals would be accepted as X_t+1 and Y_t+1. If not, there’s always the next step.

There is no boundedness condition for this algorithm; we discuss bounded conditions to theoretically justify that the chains will meet geometrically fast. Intuitively, we only use the fact that some event that can happen with >0 probability at every step will happen geometrically fast.