Archive for coupling

maximal couplings of the Metropolis-Hastings algorithm

Posted in Statistics, University life with tags , , , , , , , , , on November 17, 2020 by xi'an

As a sequel to their JRSS B paper, John O’Leary, Guanyang Wang, and [my friend, co-author and former student!] Pierre E. Jacob have recently posted a follow-up paper on maximal coupling for Metropolis-Hastings algorithms, where maximal is to be understood in terms of the largest possible probability for the coupled chains to be equal, according to the bound set by the coupling inequality. It made me realise that there is a heap of very recent works in this area.

A question that came up when reading the paper with our PhD students is whether or not the coupled chains stay identical after meeting once. When facing two different targets this seems inevitable and indeed Lemma 2 seems to show that no. A strong lemma that does not [need to] state what happens outside the diagonal Δ.

One of the essential tricks is to optimise several kinds of maximal coupling, incl. one for the Bernoullesque choice of moving, as given on p.3.

Algorithm 1 came as a novelty to me as it first seemed (to me!) the two chains may never meet, but this was before I read the small prints of the transition (proposal) kernel being maximally coupled with itself. While Algorithm 2 may be the earliest example of Metropolis-Hastings coupling I have seen, namely in 1999 in Crete, in connection with a talk by Laird Breyer and Gareth Roberts at a workshop of our ESSS network. As explained by the authors, this solution is not always a maximal coupling for the reason that

min(q¹.q²) min(α¹,α²) ≤ min(q¹α¹,q²α²)

(with q for the transition kernel and α for the acceptance probability). Lemma 1 is interesting in that it describes the probability to un-meet (!) as the surface between one of the move densities and the minimum of the two.

The first solution is to couple by plain Accept-Reject with the first chain being the proposed value and if rejected [i.e. not in C] to generate from the remainder or residual of the second target, in a form of completion of acceptance-rejection (accept when above rather than below, i.e. in A or A’). This can be shown to be a maximal coupling. Another coupling using reflection residuals works better but requires some spherical structure in the kernel. A further coupling on the acceptance of the Metropolis-Hastings move seems to bring an extra degree of improvement.

In the introduction, the alternatives about the acceptance probability α(·,·), e.g. Metropolis-Hastings versus Barker, are mentioned but would it make a difference to the preferred maximal coupling when using one or the other?

A further comment is that, in larger dimensions, I mean larger than one!, a Gibbsic form of coupling could be considered. In which case it would certainly decrease the coupling probability but may still speed up the overall convergence by coupling more often. See “maximality is sometimes less important than other properties of a coupling, such as the contraction behavior when a meeting does not occur.” (p.8)

As a final pun, I noted that Vaserstein is not a typo, as Leonid Vaseršteĭn is a Russian-American mathematician, currently at Penn State.

Couplings and Monte Carlo [advanced graduate course at Dauphine by Pierre Jacob]

Posted in Kids, pictures, Statistics, Travel with tags , , , , , , on January 20, 2020 by xi'an

As a visiting professor at Paris-Dauphine next month, Pierre Jacob will give a series of lectures on coupling and Monte Carlo. Next month on Feb. 13, 14, 25 27, at Université Paris-Dauphine, the first two starting at 8:30 (room E) and the last two starting at 13:45 (room F and D201, respectively). Attendance is open to all and material will be made available on the lecture webpage.

unbiased MCMC discussed at the RSS tomorrow night

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on December 10, 2019 by xi'an

The paper ‘Unbiased Markov chain Monte Carlo methods with couplings’ by Pierre Jacob et al. will be discussed (or Read) tomorrow at the Royal Statistical Society, 12 Errol Street, London, tomorrow night, Wed 11 December, at 5pm London time. With a pre-discussion session at 3pm, involving Chris Sherlock and Pierre Jacob, and chaired by Ioanna Manolopoulou. While I will alas miss this opportunity, due to my trip to Vancouver over the weekend, it is great that that the young tradition of pre-discussion sessions has been rekindled as it helps put the paper into perspective for a wider audience and thus makes the more formal Read Paper session more profitable. As we discussed the paper in Paris Dauphine with our graduate students a few weeks ago, we will for certain send one or several written discussions to Series B!

bandits for doubly intractable posteriors

Posted in Statistics with tags , , , , , , , , on April 17, 2019 by xi'an

Last Friday, Guanyang Wang arXived a paper on the use of multi-armed bandits (hence the reference to the three bandits) to handle intractable normalising constants. The bandit compares or mixes Møller et al. (2006) auxiliary variable solution with Murray et al. (2006) exchange algorithm. Which are both special cases of pseudo-marginal MCMC algorithms. In both cases, the auxiliary variables produce an unbiased estimator of the ratio of the constants. Rather than the ratio of two unbiased estimators as in the more standard pseudo-marginal MCMC. The current paper tries to compare the two approaches based on the variance of the ratio estimate, but cannot derive a general ordering. The multi-armed bandit algorithm exploits both estimators of the acceptance ratio to pick the one that is almost the largest, almost because there is a correction for validating the step by detailed balance. The bandit acceptance probability is the maximum [over the methods] of the minimum [over the time directions] of the original acceptance ratio. While this appears to be valid, note that the resulting algorithm implies four times as many auxiliary variates as the original ones, which makes me wonder at the gain when compared with a parallel implementation of these methods, coupled at random times. (The fundamental difficulty of simulating from likelihoods with an unknown normalising constant remains, see p.4.)

Markov Chains [not a book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on January 14, 2019 by xi'an

As Randal Douc and Éric Moulines are both very close friends and two authors of this book on Markov chains,  I cannot engage into a regular book review! Judging from the table of contents, the coverage is not too dissimilar to the now classic Markov chain Stochastic Stability book by Sean Meyn and the late Richard Tweedie (1994), called the Bible of Markov chains by Peter Glynn, with more emphasis on convergence matters and a more mathematical perspective. The 757 pages book also includes a massive appendix on maths and probability background. As indicated in the preface, “the reason [the authors] thought it would be useful to write a new book is to survey some of the developments made during the 25 years that have elapsed since the publication of Meyn and Tweedie (1993b).” Connecting with the theoretical developments brought by MCMC methods. Like subgeometric rates of convergence to stationarity, sample paths, limit theorems, and concentration inequalities. The book also reflects on the numerous contributions of the authors to the field. Hence a perfect candidate for teaching Markov chains to mathematically well-prepared. graduate audiences. Congrats to the authors!