**A**rt Owen has arXived a new version of his thinning MCMC paper, where he studies how thinning or subsampling can improve computing time in MCMC chains. I remember quite well the message set by Mark Berliner and Steve MacEachern in an early 1990’s paper that subsampling was *always* increasing the variance of the resulting estimators. We actually have this result in our Monte Carlo Statistical Methods book. Now, there are other perspectives on this, as for instance cases when thinning can be hard-wired by simulating directly a k-step move, delaying rejection or acceptance, prefetching, or simulating directly the accepted values as in our vanilla Rao-Blackwellisation approach. Here, Art considers the case when there is a cost θ of computing a transform of the simulation [when the transition cost a unit] and when those transforms are positively correlated with correlation ρ. Somewhat unsurprisingly, when θ is large enough, thinning becomes worth implementing. But requires extra computations in evaluating the correlation ρ and the cost θ, which is rarely comparable with the cost of computing the likelihood itself, a requirement for the Metropolis-Hastings or Hamiltonian Monte Carlo step(s). Subsampling while keeping the right target (which is a hard constraint!) should thus have a much more effective impact on computing budgets.

## Archive for vanilla Rao-Blackwellisation

## thinning a Markov chain, statistically

Posted in Books, pictures, R, Statistics with tags autocorrelation, computing time, MCMC, MCMC convergence, Monte Carlo Statistical Methods, thinning, vanilla Rao-Blackwellisation on June 13, 2017 by xi'an## MCMskv #4 [house with a vision]

Posted in Statistics with tags ABC, Bernoulli factory, delayed acceptance, diffusions, Lenzerheide, MCMskv, Metropolis-Hastings algorithm, qMC, quasi-Monte Carlo methods, quasi-random sequences, snow, Switzerland, vanilla Rao-Blackwellisation on January 9, 2016 by xi'an**L**ast day at MCMskv! Not yet exhausted by this exciting conference, but this was the toughest day with one more session and a tutorial by Art Own on quasi Monte-Carlo. (Not even mentioning the night activities that I skipped. Or the ski break that I did not even consider.) Krys Latunszynski started with a plenary on exact methods for discretised diffusions, with a foray in Bernoulli factory problems. Then a neat session on adaptive MCMC methods that contained a talk by Chris Sherlock on delayed acceptance, where the approximation to the target was built by knn trees. (The adaptation was through the construction of the tree by including additional evaluations of the target density. Another paper sitting in my to-read list for too a long while: the exploitation of the observed values of π towards improving an MCMC sampler has always be “obvious” to me even though I could not see any practical way of doing so. )

It was wonderful that Art Owen accepted to deliver a tutorial at MCMskv on quasi-random Monte Carlo. Great tutorial, with a neat coverage of the issues most related to Monte Carlo integration. Since quasi-random sequences have trouble with accept/reject methods, a not-even-half-baked idea that came to me during Art’s tutorial was that the increased computing power granted by qMC could lead to a generic integration of the Metropolis-Hastings step in a Rao-Blackwellised manner. Art mentioned he was hoping that in a near future one could switch between pseudo- and quasi-random in an almost automated manner when running standard platforms like R. This would indeed be great, especially since quasi-random sequences seem to be available at the same cost as their pseudo-random counterpart. During the following qMC session, Art discussed the construction of optimal sequences on sets other than hypercubes (with the surprising feature that projecting optimal sequences from the hypercube does not work). Mathieu Gerber presented the quasi-random simulated annealing algorithm he developed with Luke Bornn that I briefly discussed a while ago. Or thought I did as I cannot trace a post on that paper! While the fact that annealing also works with quasi-random sequences is not astounding, the gain over random sequences shown on two examples is clear. The session also had a talk by Lester Mckey who relies Stein’s discrepancy to measure the value of an approximation to the true target. This was quite novel, with a surprising connection to Chris Oates’ talk and the use of score-based control variates, if used in a dual approach.

Another great session was the noisy MCMC one organised by Paul Jenkins (Warwick), with again a coherent presentation of views on the quality or lack thereof of noisy (or inexact) versions, with an update from Richard Everitt on inexact MCMC, Felipe Medina Aguayo (Warwick) on sufficient conditions for noisy versions to converge (and counterexamples), Jere Koskela (Warwick) on a pseudo-likelihood approach to the highly complex Kingman’s coalescent model in population genetics (of ABC fame!), and Rémi Bardenet on the tall data approximations techniques discussed in a recent post. Having seen or read most of those results previously did not diminish the appeal of the session.

## locally weighted MCMC

Posted in Books, Statistics, University life with tags Australia, effective sample size, Harvard University, Melbourne, parallel MCMC, Rao-Blackwellisation, recycling, St Kilda, vanilla Rao-Blackwellisation on July 16, 2015 by xi'an**L**ast week, on arXiv, Espen Bernton, Shihao Yang, Yang Chen, Neil Shephard, and Jun Liu (all from Harvard) proposed a weighting scheme to associated MCMC simulations, in connection with the parallel MCMC of Ben Calderhead discussed earlier on the ‘Og. The weight attached to each proposal is either the acceptance probability itself (with the rejection probability being attached to the current value of the MCMC chain) or a renormalised version of the joint target x proposal, either forward or backward. Both solutions are unbiased in that they have the same expectation as the original MCMC average, being some sort of conditional expectation. The proof of domination in the paper builds upon Calderhead’s formalism.

This work reminded me of several reweighting proposals we made over the years, from the global Rao-Blackwellisation strategy with George Casella, to the vanilla Rao-Blackwellisation solution we wrote with Randal Douc a few years ago, both of whom also are demonstrably improving upon the standard MCMC average. By similarly recycling proposed but rejected values. Or by diminishing the variability due to the uniform draw. The slightly parallel nature of the approach also connects with our parallel MCM version with Pierre Jacob (now Harvard as well!) and Murray Smith (who now leaves in Melbourne, hence the otherwise unrelated picture).

## parallelising MCMC algorithms

Posted in Books, Statistics, University life with tags parallel MCMC, PNAS, proceedings, vanilla Rao-Blackwellisation on December 23, 2014 by xi'anThis paper, A general construction for parallelizing Metropolis-Hastings algorithms, written by Ben Calderhead, was first presented at MCMSki last January and has now appeared in PNAS. It is somewhat related to the recycling idea of Tjelmeland (2004, unpublished) and hence to our 1996 Rao-Blackwellisation paper with George. Although there is no recycling herein.

At each iteration of Ben’s algorithm, N proposed values are generated conditional on the “current” value of the Markov chain, which actually consists of (N+1) components and from which one component is drawn at random to serve as a seed for the next proposal distribution and the simulation of N other values. In short, this is a data-augmentation scheme with the index I on the one side and the N modified components on the other side. The neat trick in the proposal [and the reason for the jump in efficiency] is that the stationary distribution of the auxiliary variable can be determined and hence used (N+1) times in updating the vector of (N+1) components. (Note that picking the index at random means computing *all* (N+1) possible transitions from one component to the N others. Or even all (N+1)! if the proposals differ. Hence a potential *increase* in the computing cost, even though what costs the most is usually the likelihood computation, dispatched on the parallel processors.) While there are (N+1) terms involved at each step, the genuine Markov chain is truly over a *single* chain and the N other proposed values are not recycled. Even though they could be [for Monte Carlo integration purposes], as shown e.g. in our paper with Pierre Jacob and Murray Smith. Something that took a few iterations for me to understand is why Ben rephrases the original Metropolis-Hastings algorithm as a finite state space Markov chain on the set of indices {1,…,N+1} (Proposition 1). Conditionally on the values of the (N+1) vector, the stationary of that sub-chain is no longer uniform. Hence, picking (N+1) indices from the stationary helps in selecting the most appropriate images, which explains why the rejection rate decreases.

The paper indeed evaluates the impact of increasing the number of proposals in terms of effective sample size (ESS), acceptance rate, and mean squared jump distance, based two examples. As often in parallel implementations, the paper suggests an “N-fold increase in computational speed” even though this is simply the effect of running the *same* algorithm on a single processor and on N parallel processors. If the comparison is between a single proposal Metropolis-Hastings algorithm on a single processor and an N-fold proposal on N processors, I would say the latter is *slower* because of the selection of the index I that forces all pairs of reverse move. Nonetheless, since this is an almost free bonus resulting from using N processors, when compared with more complex coupled chains, it sounds worth investigating and comparing with those more complex parallel schemes.

## non-negative unbiased estimators

Posted in Books, Kids, Statistics, University life with tags Bernoulli factory, infinite variance estimators, monotonic function, Russian roulette, unbiasedness, vanilla Rao-Blackwellisation on October 3, 2013 by xi'an**P**ierre Jacob and Alexandre Thiéry just arXived a highly pertinent paper on the most debated issue of non-negative unbiased estimators (of positive quantities). If you remember that earlier post of mine, I mentioned the issue in connection with the Russian roulette estimator(s) of Mark Girolami et al. And, as Pierre and Alexandre point out in the paper, there is also a clear and direct connection with the Bernoulli factory problem. And with our Vanilla Rao-Blackwellisation technique (sadly overlooked, once more!).

**T**he first thing I learned from the paper is how to turn a converging sequence into an unbiased estimator. If *(E _{n})* is this converging sequence, with limit μ, then

is unbiased..! Amazing. Even though the choice of the distribution of N matters towards getting a finite variance estimator, this transform is simply amazing. (Of course, once one looks at it, one realises it is the “old” trick of turning a series into a sequence and vice-versa. Still…!) And then you can reuse it into getting an unbiased estimator for almost any transform of μ.

**T**he second novel thing in the paper is the characterisation of impossible cases for non-negative unbiased estimators. For instance, if the original sequence has an unbounded support, there cannot be such an estimator. If the support is an half-line, the transform must be ~~monotonous~~ monotonic. If the support is a bounded interval (a,b), then the transform must be bounded from below by a polynomial bound

(where the extra-parameters obviously relate to the transform). (In this later case, the authors also show how to derive a Bernoulli estimator from the original unbiased estimator.)