**L**ast Friday, Guanyang Wang arXived a paper on the use of multi-armed bandits (hence the reference to the three bandits) to handle intractable normalising constants. The bandit compares or mixes Møller et al. (2006) auxiliary variable solution with Murray et al. (2006) exchange algorithm. Which are both special cases of pseudo-marginal MCMC algorithms. In both cases, the auxiliary variables produce an unbiased estimator of the ratio of the constants. Rather than the ratio of two unbiased estimators as in the more standard pseudo-marginal MCMC. The current paper tries to compare the two approaches based on the variance of the ratio estimate, but cannot derive a general ordering. The multi-armed bandit algorithm exploits both estimators of the acceptance ratio to pick the one that is almost the largest, almost because there is a correction for validating the step by detailed balance. The bandit acceptance probability is the maximum [over the methods] of the minimum [over the time directions] of the original acceptance ratio. While this appears to be valid, note that the resulting algorithm implies four times as many auxiliary variates as the original ones, which makes me wonder at the gain when compared with a parallel implementation of these methods, coupled at random times. (The fundamental difficulty of simulating from likelihoods with an unknown normalising constant remains, see p.4.)

## Archive for coupling

## bandits for doubly intractable posteriors

Posted in Statistics with tags coupling, coupling from the past, detailed balance, doubly intractable posterior, Les Trois Brigands, multi-armed bandits, normalising constant, pseudo-marginal MCMC, ratio of integrals on April 17, 2019 by xi'an## Markov Chains [not a book review]

Posted in Books, pictures, Statistics, University life with tags book review, concentration inequalities, coupling, Eric Moulines, irreducibility, Markov chain and stochastic stability, Markov chain Monte Carlo, Markov chains, MCMC convergence, probability theory, Randal Douc, Richard Tweedie, Sean Meyn, Wasserstein distance on January 14, 2019 by xi'an**A**s Randal Douc and Éric Moulines are both very close friends and two authors of this book on Markov chains, I cannot engage into a regular book review! Judging from the table of contents, the coverage is not too dissimilar to the now classic Markov chain Stochastic Stability book by Sean Meyn and the late Richard Tweedie (1994), called the Bible of Markov chains by Peter Glynn, with more emphasis on convergence matters and a more mathematical perspective. The 757 pages book also includes a massive appendix on maths and probability background. As indicated in the preface, “the reason [the authors] thought it would be useful to write a new book is to survey some of the developments made during the 25 years that have elapsed since the publication of Meyn and Tweedie (1993b).” Connecting with the theoretical developments brought by MCMC methods. Like subgeometric rates of convergence to stationarity, sample paths, limit theorems, and concentration inequalities. The book also reflects on the numerous contributions of the authors to the field. Hence a perfect candidate for teaching Markov chains to mathematically well-prepared. graduate audiences. Congrats to the authors!

## two Parisian talks by Pierre Jacob in January

Posted in pictures, Statistics, University life with tags coupling, CREST, cut models, ENSAE, Gibbs sampling, MCMC, Paris-Saclay campus, Pierre Jacob, prior construction, Université Paris Dauphine on December 21, 2017 by xi'an**W**hile back in Paris from Harvard in early January, Pierre Jacob will give two talks on works of his:

January 09, 10:30, séminaire d’Analyse-Probabilités, Université Paris-Dauphine: Unbiased MCMC

*Markov chain Monte Carlo (MCMC) methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, MCMC estimators are generally biased after any fixed number of iterations, which complicates both parallel computation and the construction of confidence intervals. We propose to remove this bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and confidence intervals can be directly constructed from the Central Limit Theorem for i.i.d. variables. We provide practical couplings for important algorithms such as the Metropolis-Hastings and Gibbs samplers. We establish the theoretical validity of the proposed estimators, and study their variances and computational costs. In numerical experiments, including inference in hierarchical models, bimodal or high-dimensional target distributions, logistic regressions with the Pólya-Gamma Gibbs sampler and the Bayesian Lasso, we demonstrate the wide applicability of the proposed methodology as well as its limitations. Finally, we illustrate how the proposed estimators can approximate the “cut” distribution that arises in Bayesian inference for misspecified models. *

January 11, 10:30, CREST-ENSAE, Paris-Saclay: Better together? Statistical learning in models made of modules *[Warning: Paris-Saclay is not in Paris!]*

*In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference or decision problem. It is convenient to use a graphical model to represent the statistical dependencies, via a set of connected “modules”, each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the update of others. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of “cut distributions”. In this talk, I will discuss why these modular approaches might be preferable to the full model in misspecified settings, and propose principled criteria to choose between modular and full-model approaches. The question is intertwined with computational difficulties associated with the cut distribution, and new approaches based on recently proposed unbiased MCMC methods will be described*.

Long enough after the New Year festivities (if any) to be fully operational for them!

## unbiased HMC

Posted in Books, pictures, Statistics with tags Boston, coupling, Hamiltonian Monte Carlo, Harvard, HMC, Massachusetts, sunrise, unbiasedness on September 25, 2017 by xi'an**J**eremy Heng and Pierre Jacob arXived last week a paper on unbiased Hamiltonian Monte Carlo by coupling, following the earlier paper of Pierre and co-authors on debiasing by coupling a few weeks ago. The coupling within the HMC amounts to running two HMC chains with common random numbers, plus subtleties!

“As with any other MCMC method, HMC estimators are justified in the limit of the number of iterations. Algorithms which rely on such asymptotics face the risk of becoming obsolete if computational power keeps increasing through the number of available processors and not through clock speed.”

The main difficulty here is to have both chains meet (exactly) with large probability, since coupled HMC can only bring these chain close to one another. The trick stands in using both coupled HMC *and* coupled Hastings-Metropolis kernels, since the coupled MH kernel allows for exact meetings when the chains are already close, after which they remain happily and forever together! The algorithm is implemented by choosing between the kernels at random at each iteration. (Unbiasedness follows by the Glynn-Rhee trick, which is eminently well-suited for coupling!) As pointed out from the start of the paper, the appeal of this unbiased version is that the algorithm can be (embarrassingly) parallelised since all processors in use return estimators that are iid copies of one another, hence easily merged into a better estimator.

## unbiased MCMC

Posted in Books, pictures, Statistics, Travel, University life with tags convergence assessment, coupling, coupling from the past, cut distribution, MCMC, MCMSki IV, perfect sampling, renewal process on August 25, 2017 by xi'an**T**wo weeks ago, Pierre Jacob, John O’Leary, and Yves F. Atchadé arXived a paper on unbiased MCMC with coupling. Associating MCMC with unbiasedness is rather challenging since MCMC are rarely producing simulations from the exact target, unless specific tools like renewal can be produced in an efficient manner. (I supported the use of such renewal techniques as early as 1995, but later experiments led me to think renewal control was too rare an occurrence to consider it as a generic convergence assessment method.)

This new paper makes me think I had given up too easily! Here the central idea is coupling of two (MCMC) chains, associated with the debiasing formula used by Glynn and Rhee (2014) and already discussed here. Having the coupled chains meet at some time with probability one implies that the debiasing formula does not need a (random) stopping time. The coupling time is sufficient. Furthermore, several estimators can be derived from the same coupled Markov chain simulations, obtained by starting the averaging at a later time than the first iteration. The average of these (unbiased) averages results into a weighted estimate that weights more the later differences. Although coupling is also at the basis of perfect simulation methods, the analogy between this debiasing technique and perfect sampling is hard to fathom, since the coupling of two chains is not a perfect sampling instant. (Something obvious only in retrospect for me is that the variance of the resulting unbiased estimator is at best the variance of the original MCMC estimator.)

When discussing the implementation of coupling in Metropolis and Gibbs settings, the authors give a simple optimal coupling algorithm I was not aware of. Which is a form of accept-reject also found in perfect sampling I believe. (Renewal based on small sets makes an appearance on page 11.) I did not fully understood the way two random walk Metropolis steps are coupled, in that the normal proposals seem at odds with the boundedness constraints. But coupling is clearly working in this setting, while renewal does not. In toy examples like the (Efron and Morris!) baseball data and the (Gelfand and Smith!) pump failure data, the parameters k and m of the algorithm can be optimised against the variance of the averaged averages. And this approach comes highly useful in the case of the cut distribution, a problem which I became aware of during MCMskiii and on which we are currently working with Pierre and others.

## accelerating MCMC

Posted in Statistics with tags acceleration of MCMC algorithms, coupling, Hamiltonian Monte Carlo, India, MCMC, Monte Carlo Statistical Methods, motorbike, NUTS, Rajasthan, review, survey, tempering, WIREs on May 29, 2017 by xi'an**I** have recently [well, not so recently!] been asked to write a review paper on ways of accelerating MCMC algorithms for the [review] journal WIREs Computational Statistics and would welcome all suggestions towards the goal of accelerating MCMC algorithms. Besides [and including more on]

- coupling strategies using different kernels and switching between them;
- tempering strategies using flatter or lower dimensional targets as intermediary steps, e.g., à la Neal;
- sequential Monte Carlo with particle systems targeting again flatter or lower dimensional targets and adapting proposals to this effect;
- Hamiltonian MCMC, again with connections to Radford (and more generally ways of avoiding rejections);
- adaptive MCMC, obviously;
- Rao-Blackwellisation, just as obviously (in the sense that increasing the precision in the resulting estimates means less simulations).

## Russian roulette still rolling

Posted in Statistics with tags AISTATS 2017, Biometrika, coupling, debiasing, doubly intractable problems, harmonic mean estimator, MCMC, MCMC algorithm, normalising constant, Peter Glynn, pseudo-marginal MCMC, Rao-Blackwellisation, Russian roulette on March 22, 2017 by xi'an**C**olin Wei and Iain Murray arXived a new version of their paper on doubly-intractable distributions, which is to be presented at AISTATS. It builds upon the Russian roulette estimator of Lyne et al. (2015), which itself exploits the debiasing technique of McLeish et al. (2011) [found earlier in the physics literature as in Carter and Cashwell, 1975, according to the current paper]. Such an unbiased estimator of the inverse of the normalising constant can be used for pseudo-marginal MCMC, except that the estimator is sometimes negative and has to be so as proved by Pierre Jacob and co-authors. As I discussed in my post on the Russian roulette estimator, replacing the negative estimate with its absolute value does not seem right because a negative value indicates that the quantity is close to zero, hence replacing it with zero would sound more appropriate. Wei and Murray start from the property that, while the expectation of the importance weight is equal to the normalising constant, the expectation of the inverse of the importance weight converges to the inverse of the weight for an MCMC chain. This however sounds like an harmonic mean estimate because the property would also stand for any substitute to the importance density, as it only requires the density to integrate to one… As noted in the paper, the variance of the resulting Roulette estimator “will be high” or even infinite. Following Glynn et al. (2014), the authors build a coupled version of that solution, which key feature is to cut the higher order terms in the debiasing estimator. This does not guarantee finite variance or positivity of the estimate, though. In order to decrease the variance (assuming it is finite), backward coupling is introduced, with a Rao-Blackwellisation step using our 1996 Biometrika derivation. Which happens to be of lower cost than the standard Rao-Blackwellisation in that special case, O(N) versus O(N²), N being the stopping rule used in the debiasing estimator. Under the assumption that the *inverse* importance weight has finite expectation [wrt the importance density], the resulting backward-coupling Russian roulette estimator can be proven to be unbiased, as it enjoys a finite expectation. (As in the generalised harmonic mean case, the constraint imposes thinner tails on the importance function, which then hampers the convergence of the MCMC chain.) No mention is made of achieving finite variance for those estimators, which again is a serious concern due to the similarity with harmonic means…