**L**ast Friday, Guanyang Wang arXived a paper on the use of multi-armed bandits (hence the reference to the three bandits) to handle intractable normalising constants. The bandit compares or mixes Møller et al. (2006) auxiliary variable solution with Murray et al. (2006) exchange algorithm. Which are both special cases of pseudo-marginal MCMC algorithms. In both cases, the auxiliary variables produce an unbiased estimator of the ratio of the constants. Rather than the ratio of two unbiased estimators as in the more standard pseudo-marginal MCMC. The current paper tries to compare the two approaches based on the variance of the ratio estimate, but cannot derive a general ordering. The multi-armed bandit algorithm exploits both estimators of the acceptance ratio to pick the one that is almost the largest, almost because there is a correction for validating the step by detailed balance. The bandit acceptance probability is the maximum [over the methods] of the minimum [over the time directions] of the original acceptance ratio. While this appears to be valid, note that the resulting algorithm implies four times as many auxiliary variates as the original ones, which makes me wonder at the gain when compared with a parallel implementation of these methods, coupled at random times. (The fundamental difficulty of simulating from likelihoods with an unknown normalising constant remains, see p.4.)

## Archive for coupling from the past

## bandits for doubly intractable posteriors

Posted in Statistics with tags coupling, coupling from the past, detailed balance, doubly intractable posterior, Les Trois Brigands, multi-armed bandits, normalising constant, pseudo-marginal MCMC, ratio of integrals on April 17, 2019 by xi'an## spacings on a torus

Posted in Books, Kids, R, Statistics, University life with tags coupling from the past, cross validated, Dirichlet distribution, perfect sampling, ranking and selection, spacings, torus, University of Warwick, Wilfrid Kendall on March 22, 2018 by xi'an**W**hile in Brussels last week I noticed an interesting question on X validated that I considered in the train back home and then more over the weekend. This is a question about spacings, namely how long on average does it take to cover an interval of length L when drawing unit intervals at random (with a torus handling of the endpoints)? Which immediately reminded me of Wilfrid Kendall (Warwick) famous gif animation of coupling from the past via leaves covering a square region, from the top (forward) and from the bottom (backward)…

The problem is rather easily expressed in terms of uniform spacings, more specifically on the maximum spacing being less than 1 (or 1/L depending on the parameterisation). Except for the additional constraint at the boundary, which is not independent of the other spacings. Replacing this extra event with an independent spacing, there exists a direct formula for the expected stopping time, which can be checked rather easily by simulation. But the exact case appears to be add a few more steps to the draws, 3/2 apparently. The following graph displays the regression of the Monte Carlo number of steps over 10⁴ replicas against the exact values:

## unbiased MCMC

Posted in Books, pictures, Statistics, Travel, University life with tags convergence assessment, coupling, coupling from the past, cut distribution, MCMC, MCMSki IV, perfect sampling, renewal process on August 25, 2017 by xi'an**T**wo weeks ago, Pierre Jacob, John O’Leary, and Yves F. Atchadé arXived a paper on unbiased MCMC with coupling. Associating MCMC with unbiasedness is rather challenging since MCMC are rarely producing simulations from the exact target, unless specific tools like renewal can be produced in an efficient manner. (I supported the use of such renewal techniques as early as 1995, but later experiments led me to think renewal control was too rare an occurrence to consider it as a generic convergence assessment method.)

This new paper makes me think I had given up too easily! Here the central idea is coupling of two (MCMC) chains, associated with the debiasing formula used by Glynn and Rhee (2014) and already discussed here. Having the coupled chains meet at some time with probability one implies that the debiasing formula does not need a (random) stopping time. The coupling time is sufficient. Furthermore, several estimators can be derived from the same coupled Markov chain simulations, obtained by starting the averaging at a later time than the first iteration. The average of these (unbiased) averages results into a weighted estimate that weights more the later differences. Although coupling is also at the basis of perfect simulation methods, the analogy between this debiasing technique and perfect sampling is hard to fathom, since the coupling of two chains is not a perfect sampling instant. (Something obvious only in retrospect for me is that the variance of the resulting unbiased estimator is at best the variance of the original MCMC estimator.)

When discussing the implementation of coupling in Metropolis and Gibbs settings, the authors give a simple optimal coupling algorithm I was not aware of. Which is a form of accept-reject also found in perfect sampling I believe. (Renewal based on small sets makes an appearance on page 11.) I did not fully understood the way two random walk Metropolis steps are coupled, in that the normal proposals seem at odds with the boundedness constraints. But coupling is clearly working in this setting, while renewal does not. In toy examples like the (Efron and Morris!) baseball data and the (Gelfand and Smith!) pump failure data, the parameters k and m of the algorithm can be optimised against the variance of the averaged averages. And this approach comes highly useful in the case of the cut distribution, a problem which I became aware of during MCMskiii and on which we are currently working with Pierre and others.

## Le Monde puzzle [#947]

Posted in Books, Kids, pictures, R, Statistics with tags aperiodicity, chequerboard, coupling from the past, dead leaves, Le Monde, mathematical puzzle, R, Wilfrid Kendall on February 2, 2016 by xi'an**A**nother boardgame in Le Monde mathematical puzzle :

Given an 8×8 chequerboard, consider placing 2×2 tiles over thischequerboard until (a) the entire surface is covered and (b) removing a single 2×2 tile exposes some of the original chequerboard. What is the maximal number of 2×2 tiles one can set according to this scheme? And for a 10×10 chequerboard?

**T**his puzzle reminded me of Wilfrid Kendall’s introduction to perfect sampling with leaves seen through a ceiling window falling from above, until sky was no longer visible. The representation was used by Wilfrid to explain that coupling from the past did not need to go all the way back to infinity:

Defining a random coverage of the chequerboard by those 2×2 tiles amounts to selecting a random permutation þ of 1:(n-1)² and finding the subvector of þ producing a full coverage

grid=matrix(0,n,n) path=sample(1:(n-1)^2) #random permutation path=path+((path-1)%/%(n-1)) #account for border shift i=1 neigh=c(0,1,n,n+1) while (min(grid)==0){ #all entries covered grid[path[i]+neigh]=grid[path[i]+neigh]+1 i=i+1 } i=i-1

Then removing superfluous tiles:

for (k in sample(1:i)){ krid=grid krid[path[k]+neigh]=krid[path[k]+neigh]-1 if (min(krid)>0){ #off used below off[k]=FALSE; grid=krid} #useless tile }

And checking the remaining ones are truly essential:

mingrid=0 for (k in (1:i)[off]){ krid=grid krid[path[k]+neigh]=krid[path[k]+neigh]-1 mingrid=max(mingrid,min(krid)) } sky=(mingrid>0) #rejection of the grid

leads to the maximum number of tiles to be *[at least]* M=16,25,37,54 for n=6,8,10,12…

## perfect sampling, just perfect!

Posted in Books, Statistics, University life with tags Bernoulli factory, book review, coupling from the past, forward-backward formula, java, Mark Huber, MCMC algorithms, Monte Carlo Statistical Methods, perfect sampling on January 19, 2016 by xi'an**G**reat news! Mark Huber (whom I’ve know for many years, so this review may be not completely objective!) has just written a book on perfect simulation! I remember (and still share) the excitement of the MCMC community when the first perfect simulation papers of Propp and Wilson (1995) came up on the (now deceased) MCMC preprint server, as it seemed then the ideal (perfect!) answer to critics of the MCMC methodology, plugging MCMC algorithms into a generic algorithm that eliminating burnin, warmup, and convergence issues… It seemed both magical, with the simplest argument: “start at T=-∞ to reach stationarity at T=0”, and esoteric (“why forward fails while backward works?!”), requiring simple random walk examples (and a java app by Jeff Rosenthal) to understand the difference (between backward and forward), as well as Wilfrid Kendall’s kids’ coloured wood cubes and his layer of leaves falling on the ground and seen from below… These were exciting years, with MCMC still in its infancy, and no goal seemed too far away! Now that years have gone, and that the excitement has clearly died away, perfect sampling can be considered in a more sedate manner, with pros and cons well-understood. This is why Mark Huber’s book is coming at a perfect time if any! It covers the evolution of the perfect sampling techniques, from the early coupling from the past to the monotonous versions, to the coalescence principles, with applications to spatial processes, to the variations on nested sampling and their use in doubly intractable distributions, with forays into the (fabulous) Bernoulli factory problem (a surprise for me, as Bernoulli factories are connected with unbiasedness, not stationarity! Even though my only fieldwork [with Randal Douc] in such factories was addressing a way to turn MCMC into importance sampling. The key is in the notion of approximate densities, introduced in Section 2.6.). The book is quite thorough with the probabilistic foundations of the different principles, with even “a [tiny weeny] little bit of measure theory.

Any imperfection?! Rather, only a (short too short!) reflection on the limitations of perfect sampling, namely that it cannot cover the simulation of posterior distributions in the Bayesian processing of most statistical models. Which makes the quote

“Distributions where the label of a node only depends on immediate neighbors, and where there is a chance of being able to ignore the neighbors are the most easily handled by perfect simulation protocols (…) Statistical models in particular tend to fall into this category, as they often do not wish to restrict the outcome too severely, instead giving the data a chance to show where the model is incomplete or incorrect.” (p.223)

just surprising, given the very small percentage of statistical models which can be handled by perfect sampling. And the downsizing of perfect sampling related papers in the early 2000’s. Which also makes the final and short section on the future of perfect sampling somewhat restricted in its scope.

So, great indeed!, a close to perfect entry to a decade of work on perfect sampling. If you have not heard of the concept before, consider yourself lucky to be offered such a gentle guidance into it. If you have dabbled with perfect sampling before, reading the book will be like meeting old friends and hearing about their latest deeds. More formally, Mark Huber’s book should bring you a new perspective on the topic. (As for me, I had never thought of connecting perfect sampling with accept reject algorithms.)