**P**reviously, I posted a comment on a paper by Alex Shestopaloff and Radford Neal, after my visit to Toronto two years ago, using a particular version of ensemble Monte Carlo. A new paper by the same authors was recently arXived, as an refinement of the embedded HMM paper of Neal (2003), in that the authors propose a new and more efficient way to generate from the (artificial) embedded hidden Markov sampler that is central to their technique of propagating a set of pool states. The method exploits both forward and backward representations of HMMs in an alternating manner. And propagates the pool states from one observation time to the next. The paper also exploits latent Gaussian structures to make autoregressive proposals, as well as flip proposals from x to -x [which seem to only make sense when 0 is a central value for the target, i.e. when the observables y only depend on |x|]. All those modifications bring the proposal quite close to (backward) particle Gibbs, the difference being in using Metropolis rather than importance steps. And in an improvement brought by the embedded HMM approach, even though it is always delicate to generalise those comparisons when some amount of calibration is required by both algorithms under comparison. (Especially delicate when it is rather remote from my area of expertise!) Anyway, I am still intrigued [in a positive way] by the embedded HMM idea as it remains mysterious that a finite length HMM simulation can improve the convergence performances that much. And wonder at a potential connection with an earlier paper of Anthony Lee and Krys Latuszynski using a random number of auxiliary variables. Presumably a wrong impression from a superficial memory…

## Archive for geometric ergodicity

## Sampling latent states for high-dimensional non-linear state space models with the embedded HMM method

Posted in Books, pictures, Statistics, University life with tags ABC, arXiv, ensemble Monte Carlo, forward-backward formula, geometric ergodicity, hidden Markov models, HMM, Radford Neal, University of Toronto on March 17, 2016 by xi'an## stability of noisy Metropolis-Hastings

Posted in Books, Statistics, Travel, University life with tags geometric ergodicity, Monte Carlo within Metropolis, noisy Metropolis-Hastings algorithm, pseudo-marginal MCMC on April 3, 2015 by xi'an**F**elipe Medina-Aguayo, Anthony Lee and Gareths Roberts, all from Warwick, arXived last Thursday a paper on the stability properties of noisy Metropolis-Hastings algorithms. The validation of unbiased estimators of the target à la Andrieu and Roberts (2009, AoS)—often discussed here—is in fact obvious when following the auxiliary variable representation of Andrieu and Vihola (2015, AoAP). Assuming the unbiased estimator of the target is generated conditional on the proposed value in the original Markov chain. The noisy version of the above means refreshing the unbiased estimator at each iteration. It also goes under the name of Monte Carlo within Metropolis. The difficulty with this noisy version is that it is not exact, i.e., does not enjoy the true target as its marginal stationary distribution. The paper by Medina-Aguayo, Lee and Roberts focusses on its validation or invalidation (with examples of transient noisy versions). Under geometric ergodicity of the marginal chain, plus some stability in the weights, the noisy version is also geometrically ergodic. A drift condition on the proposal kernel is also sufficient. Under (much?) harder conditions, the limiting distribution of the noisy chain is asymptotically in the number of unbiased estimators the true target. The result is thus quite interesting in that it provides sufficient convergence conditions, albeit not always easy to check in realistic settings.

## ABC à Montréal

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags ABC, ABC in Montréal, ABC model choice, ABC-MCMC, ABC-SMC, arXiv, Canada, conference, geometric ergodicity, machine learning, Montréal, NIPS 2014, poster, pseudo-data, Québec, snow, treadmill on December 13, 2014 by xi'an**S**o today was the NIPS 2014 workshop, “ABC in Montréal“, which started with a fantastic talk by Juliane Liepe on some exciting applications of ABC to the migration of immune cells, with the analysis of movies involving those cells acting to heal a damaged fly wing and a cut fish tail. Quite amazing videos, really. (With the great entry line of ‘We have all cut a finger at some point in our lives’!) The statistical model behind those movies was a random walk on a grid, with different drift and bias features that served as model characteristics. Frank Wood managed to deliver his talk despite a severe case of food poisoning, with a great illustration of probabilistic programming that made me understand (at last!) the very idea of probabilistic programming. And Vikash Mansinghka presented some applications in image analysis. Those two talks led me to realise why probabilistic programming was so close to ABC, with a programming touch! Hence why I was invited to talk today! Then Dennis Prangle exposed his latest version of lazy ABC, that I have already commented on the ‘Og, somewhat connected with our delayed acceptance algorithm, to the point that maybe something common can stem out of the two notions. Michael Blum ended the day with provocative answers to the provocative question of Ted Meeds as to whether or not machine learning needed ABC (*Ans.* No!) and whether or not machine learning could help ABC (*Ans.* ???). With an happily mix-up between mechanistic and phenomenological models that helped generating discussion from the floor.

The posters were also of much interest, with calibration as a distance measure by Michael Guttman, in continuation of the poster he gave at MCMski, Aaron Smith presenting his work with Luke Bornn, Natesh Pillai and Dawn Woodard, on why a single pseudo-sample is enough for ABC efficiency. This gave me the opportunity to discuss with him the apparent contradiction with the result of Kryz Łatunsziński and Anthony Lee about the geometric convergence of ABC-MCMC only attained with a random number of pseudo-samples… And to wonder if there is a geometric *versus* binomial dilemma in this setting, Namely, whether or not simulating pseudo-samples until one is accepted would be more efficient than just running one and discarding it in case it is too far. So, although the audience was not that large (when compared with the other “ABC in…” *and* when considering the 2500+ attendees at NIPS over the week!), it was a great day where I learned a lot, did not have a doze during talks (!), *[and even had an epiphany of sorts at the treadmill when I realised I just had to take longer steps to reach 16km/h without hyperventilating!]* So thanks to my fellow organisers, Neil D Lawrence, Ted Meeds, Max Welling, and Richard Wilkinson for setting the program of that day! And, by the way, where’s the next “ABC in…”?! (Finland, maybe?)

## making a random walk geometrically ergodic

Posted in R, Statistics with tags Annals of Statistics, CRAN, geometric ergodicity, métro, MCMC, Metropolis-Hastings, R, R package, random walk, uniform ergodicity on March 2, 2013 by xi'an**W**hile a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, *AoS*, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent *Annals of Statistics* paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc, started thanks to the ‘Og and still under completion.

## dirty MCMC streams

Posted in Statistics, Travel, University life with tags geometric ergodicity, Markov chain Monte Carlo, MCMC, pseudo-random generator, randomness on May 7, 2012 by xi'an

**I**ain Murray and Lloyd T. Elliott had posted this paper on arXiv just before I left for my U,K, 2012 tour and I did not have time to read it in detail, nor obviously to report on it. Fortunately, during the ICMS meeting, Iain presented an handmade poster on this paper that allowed me a quick tour, enough to report on the contents! The main point of the paper is that it is possible to modify many standard MCMC codes so that they can be driven by a dependent random sequence. The authors show that various if specific dependent sequences of uniform variates do not modify the right target and the ergodicity of the MCMC scheme. As mentioned in the conclusion of the paper, this may have interesting consequences in parallel implementations where randomness becomes questionable, or in physical random generators, whose independence may also be questionable…

## How quickly does randomness appear?

Posted in Statistics, University life with tags convergence, geometric ergodicity, La Recherche, Metropolis-Hastings algorithms, Nice, randomness, spectral gap, total variation, uniform ergodicity on November 10, 2011 by xi'an**T**his was the [slightly off-key] title of the math column in the November issue of *La Recherche*, in any case intriguing enough for me to buy this general public science magazine on the metro platform and to read it immediately while waiting for an uncertain train, thanks to the nth strike of the year on my train line… But this was the occasion for an exposition of the Metropolis algorithm in a general public journal! The column actually originated from a recently published paper by Persi Diaconis, Gilles Lebeaux, and Laurent Michel, *Geometric analysis for the Metropolis algorithm on Lipschitz domain*, in ** Inventiones Mathematicae** [one of the top pure math journals]. The column in

*La Recherche*described the Metropolis algorithm (labelled there a

*random walk on Markov chains*!), alluded to the use of MCMC methods in statistics, told the genesis of the paper [namely the long-term invitation of Persi Diaconis in Nice a few years ago] and briefly explained the convergence result, namely the convergence of the Metropolis algorithm to the stationary measure at a geometric rate, with an application to the non-overlapping balls problem.

**I**f you take a look at the paper, you will see it is a beautiful piece of mathematics, establishing a spectral gap on the Markov operator associated with the Metropolis algorithm and deducing a uniformly geometric convergence [in total variation] for most regular-and-bounded-support distributions. A far from trivial and fairly general result. *La Recherche* however fails to mention the whole corpus of MCMC convergence results obtained in the 1990’s and 2000’s, by many authors, incl. Richard Tweedie, Gareth Roberts, Jeff Rosenthal, Eric Moulines, Gersende Fort, Randal Douc, Kerrie Mengersen, and others…

## Random dive MH

Posted in R, Statistics with tags geometric ergodicity, Metropolis-Hastings, R, random walk, randomness on September 2, 2010 by xi'an**A** new Metropolis-Hastings algorithm that I would call “universal” was posted by Somak Dutta yesterday on arXiv. *Multiplicative random walk Metropolis-Hastings on the real line* contains a different Metropolis-Hastings algorithm called the *random dive*. The proposed new value *x’* given the current value *x* is defined by

when is a random variable on . Thus, at each iteration, the current value is either shrunk or expanded by a random amount. When I read the paper in the Paris metro, while appreciating that this algorithm could be geometrically ergodic as opposed to the classical random walk algorithm, I was not convinced by the practical impact of the method, as I thought that the special role of zero in the proposal was not always reflected in the target. Especially when considering that the proposal is parameter-free. However, after running the following R program on a target centred away from zero, I found the proposal quite efficient.

f=function(x){.5*dnorm(x,mean=14)+.5*dnorm(x,mean=35)}

Nsim=10^5

x=rep(5,Nsim)

for (t in 2:Nsim){

coef=runif(1,min=-1)^sample(c(-1,1),1)

prop=x[t-1]*coef

prob=abs(coef)*f(prop)/f(x[t-1])

x[t]=x[t-1]

if (runif(1)<prob) x[t]=prop

}

hist(x,pro=T,nclass=113,col=”wheat2″)

curve(f,add=T,n=1001,col=”sienna”,lwd=2)

Obviously, it is difficult to believe that this extension will keep working similarly well when the dimension increases but this is an interesting way of creating a heavy tail proposal.