Archive for geometric ergodicity

non-reversible Langevin samplers

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on February 6, 2017 by xi'an

In the train to Oxford yesterday night, I read through the recently arXived Duncan et al.’s Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation. Standing up the whole trip in the great tradition of British trains.

The paper is fairly theoretical and full of Foster-Lyapunov assumptions but aims at defending an approach based on a non-reversible diffusion. One idea is that the diffusion based on the drift {∇ log π(x) + γ(x)} is associated with the target π provided

∇ . {π(x)γ(x)} = 0

which holds for the Langevin diffusion when γ(x)=0, but produces a non-reversible process in the alternative. The Langevin choice γ(x)=0 happens to be the worst possible when considering the asymptotic variance. In practice however the diffusion need be discretised, which induces an approximation that may be catastrophic for convergence if not corrected, and a relapse into reversibility if corrected by Metropolis. The proposal in the paper is to use a Lie-Trotter splitting I had never heard of before to split between reversible [∇ log π(x)] and non-reversible [γ(x)] parts of the process. The deterministic part is chosen as γ(x)=∇ log π(x) [but then what is the point since this is Langevin?] or as the gradient of a power of π(x). Although I was mostly lost by that stage, the paper then considers the error induced by a numerical integrator related with this deterministic part, towards deriving asymptotic mean and variance for the splitting scheme. On the unit hypercube. Although the paper includes a numerical example for the warped normal target, I find it hard to visualise the implementation of this scheme. Having obviously not heeded Nicolas’ and James’ advice, the authors also analyse the Pima Indian dataset by a logistic regression!)

Sampling latent states for high-dimensional non-linear state space models with the embedded HMM method

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 17, 2016 by xi'an

IMG_19390Previously, I posted a comment on a paper by Alex Shestopaloff and Radford Neal, after my visit to Toronto two years ago, using a particular version of ensemble Monte Carlo. A new paper by the same authors was recently arXived, as an refinement of the embedded HMM paper of Neal (2003), in that the authors propose a new and more efficient way to generate from the (artificial) embedded hidden Markov sampler that is central to their technique of propagating a set of pool states. The method exploits both forward and backward representations of HMMs in an alternating manner. And propagates the pool states from one observation time to the next. The paper also exploits latent Gaussian structures to make autoregressive proposals, as well as flip proposals from x to -x [which seem to only make sense when 0 is a central value for the target, i.e. when the observables y only depend on |x|]. All those modifications bring the proposal quite close to (backward) particle Gibbs, the difference being in using Metropolis rather than importance steps. And in an improvement brought by the embedded HMM approach, even though it is always delicate to generalise those comparisons when some amount of calibration is required by both algorithms under comparison. (Especially delicate when it is rather remote from my area of expertise!) Anyway, I am still intrigued [in a positive way] by the embedded HMM idea as it remains mysterious that a finite length HMM simulation can improve the convergence performances that much. And wonder at a potential connection with an earlier paper of Anthony Lee and Krys Latuszynski using a random number of auxiliary variables. Presumably a wrong impression from a superficial memory…

stability of noisy Metropolis-Hastings

Posted in Books, Statistics, Travel, University life with tags , , , on April 3, 2015 by xi'an

UWpondFelipe Medina-Aguayo, Anthony Lee and Gareths Roberts, all from Warwick, arXived last Thursday a paper on the stability properties of noisy Metropolis-Hastings algorithms. The validation of unbiased estimators of the target à la Andrieu and Roberts (2009, AoS)—often discussed here—is in fact obvious when following the auxiliary variable representation of Andrieu and Vihola (2015, AoAP). Assuming the unbiased estimator of the target is generated conditional on the proposed value in the original Markov chain. The noisy version of the above means refreshing the unbiased estimator at each iteration. It also goes under the name of Monte Carlo within Metropolis. The difficulty with this noisy version is that it is not exact, i.e., does not enjoy the true target as its marginal stationary distribution. The paper by Medina-Aguayo, Lee and Roberts focusses on its validation or invalidation (with examples of transient noisy versions). Under geometric ergodicity of the marginal chain, plus some stability in the weights, the noisy version is also geometrically ergodic. A drift condition on the proposal kernel is also sufficient. Under (much?) harder conditions, the limiting distribution of the noisy chain is asymptotically in the number of unbiased estimators the true target. The result is thus quite interesting in that it provides sufficient convergence conditions, albeit not always easy to check in realistic settings.

ABC à Montréal

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on December 13, 2014 by xi'an

Montreal1So today was the NIPS 2014 workshop, “ABC in Montréal“, which started with a fantastic talk by Juliane Liepe on some exciting applications of ABC to the migration of immune cells, with the analysis of movies involving those cells acting to heal a damaged fly wing and a cut fish tail. Quite amazing videos, really. (With the great entry line of ‘We have all cut  a finger at some point in our lives’!) The statistical model behind those movies was a random walk on a grid, with different drift and bias features that served as model characteristics. Frank Wood managed to deliver his talk despite a severe case of food poisoning, with a great illustration of probabilistic programming that made me understand (at last!) the very idea of probabilistic programming. And  Vikash Mansinghka presented some applications in image analysis. Those two talks led me to realise why probabilistic programming was so close to ABC, with a programming touch! Hence why I was invited to talk today! Then Dennis Prangle exposed his latest version of lazy ABC, that I have already commented on the ‘Og, somewhat connected with our delayed acceptance algorithm, to the point that maybe something common can stem out of the two notions. Michael Blum ended the day with provocative answers to the provocative question of Ted Meeds as to whether or not machine learning needed ABC (Ans. No!) and whether or not machine learning could help ABC (Ans. ???). With an happily mix-up between mechanistic and phenomenological models that helped generating discussion from the floor.

The posters were also of much interest, with calibration as a distance measure by Michael Guttman, in continuation of the poster he gave at MCMski, Aaron Smith presenting his work with Luke Bornn, Natesh Pillai and Dawn Woodard, on why a single pseudo-sample is enough for ABC efficiency. This gave me the opportunity to discuss with him the apparent contradiction with the result of Kryz Łatunsziński and Anthony Lee about the geometric convergence of ABC-MCMC only attained with a random number of pseudo-samples… And to wonder if there is a geometric versus binomial dilemma in this setting, Namely, whether or not simulating pseudo-samples until one is accepted would be more efficient than just running one and discarding it in case it is too far. So, although the audience was not that large (when compared with the other “ABC in…” and when considering the 2500+ attendees at NIPS over the week!), it was a great day where I learned a lot, did not have a doze during talks (!), [and even had an epiphany of sorts at the treadmill when I realised I just had to take longer steps to reach 16km/h without hyperventilating!] So thanks to my fellow organisers, Neil D Lawrence, Ted Meeds, Max Welling, and Richard Wilkinson for setting the program of that day! And, by the way, where’s the next “ABC in…”?! (Finland, maybe?)

making a random walk geometrically ergodic

Posted in R, Statistics with tags , , , , , , , , , on March 2, 2013 by xi'an

While a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, AoS, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent Annals of Statistics paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc,  started thanks to the ‘Og and still under completion.

dirty MCMC streams

Posted in Statistics, Travel, University life with tags , , , , on May 7, 2012 by xi'an

 

Iain Murray and Lloyd T. Elliott had posted this paper on arXiv just before I left for my U,K, 2012 tour and I did not have time to read it in detail, nor obviously to report on it. Fortunately, during the ICMS meeting, Iain presented an handmade poster on this paper that allowed me a quick tour, enough to report on the contents! The main point of the paper is that it is possible to modify many standard MCMC codes so that they can be driven by a dependent random sequence. The authors show that various if specific dependent sequences of uniform variates do not modify the right target and the ergodicity of the MCMC scheme. As mentioned in the conclusion of the paper, this may have interesting consequences in parallel implementations where randomness becomes questionable, or in physical random generators, whose independence may also be questionable…

How quickly does randomness appear?

Posted in Statistics, University life with tags , , , , , , , , on November 10, 2011 by xi'an

This was the [slightly off-key]  title of the math column in the November issue of La Recherche, in any case intriguing enough for me to buy this general public science magazine on the metro platform and to read it immediately while waiting for an uncertain train, thanks to the nth strike of the year on my train line… But this was the occasion for an exposition of the Metropolis algorithm in a general public journal! The column actually originated from a recently published paper by Persi Diaconis, Gilles Lebeaux, and Laurent Michel,  Geometric analysis for the Metropolis algorithm on Lipschitz domain, in Inventiones Mathematicae [one of the top pure math journals]. The column in La Recherche described the Metropolis algorithm (labelled there a random walk on Markov chains!), alluded to the use of MCMC methods in statistics, told the genesis of the paper [namely the  long-term invitation of Persi Diaconis in Nice a few years ago] and briefly explained the convergence result, namely the convergence of the Metropolis algorithm to the stationary measure at a geometric rate, with an application to the non-overlapping balls problem.

If you take a look at the paper, you will see it is a beautiful piece of mathematics, establishing a spectral gap on the Markov operator associated with the Metropolis algorithm and deducing a uniformly geometric convergence [in total variation] for most regular-and-bounded-support distributions. A far from trivial and fairly general result. La Recherche however fails to mention the whole corpus of MCMC convergence results obtained in the 1990’s and 2000’s, by many authors, incl. Richard Tweedie, Gareth Roberts, Jeff Rosenthal, Eric Moulines, Gersende Fort, Randal Douc, Kerrie Mengersen, and others…