## Archive for Hastings-Metropolis sampler

## delayed but published!

Posted in Statistics with tags acceleration of MCMC algorithms, AIMS, delayed acceptance, Foundations of Data Science, Hastings-Metropolis sampler, Monte Carlo Statistical Methods, publication on June 20, 2019 by xi'an## convergences of MCMC and unbiasedness

Posted in pictures, Statistics, University life with tags asynchronous algorithms, Hastings-Metropolis sampler, impatient user, maximal coupling, MCMC convergence, optimal transport, parallelisation, Paris Dauphine, perfect sampling, unbiased MCMC on January 16, 2018 by xi'an**D**uring his talk on unbiased MCMC in Dauphine today, Pierre Jacob provided a nice illustration of the convergence modes of MCMC algorithms. With the stationary target achieved after 100 Metropolis iterations, while the mean of the target taking much more iterations to be approximated by the empirical average. Plus a nice connection between coupling time and convergence. Convergence to the target.During Pierre’s talk, some simple questions came to mind, from developing an “impatient user version”, as in perfect sampling, in order to stop chains that run “forever”, to optimising parallelisation in order to avoid problems of asynchronicity. While the complexity of coupling increases with dimension and the coupling probability goes down, the average coupling time varies but an unexpected figure is that the expected cost per iteration is of 2 simulations, irrespective of the chosen kernels. Pierre also made a connection with optimal transport coupling and stressed that the maximal coupling was for the proposal and not for the target.

## Langevin on a wrong bend

Posted in Books, Statistics with tags CREST, Hastings-Metropolis sampler, Langevin diffusion, MALA, pseudo-marginal MCMC, scalable MCMC, stochastic gradient descent, Wasserstein distance on October 19, 2017 by xi'an**A**rnak Dalayan and Avetik Karagulyan (CREST) arXived a paper the other week on a focussed study of the Langevin algorithm [not MALA] when the gradient of the target is incorrect. With the following improvements *[quoting non-verbatim from the paper]*:

- a varying-step Langevin that reduces the number of iterations for a given Wasserstein precision, compared with recent results by e.g. Alan Durmus and Éric Moulines;
- an extension of convergence results for error-prone evaluations of the gradient of the target (i.e., the gradient is replaced with a noisy version, under some moment assumptions that do not include unbiasedness);
- a new second-order sampling algorithm termed LMCO’, with improved convergence properties.

What is particularly interesting to me in this setting is the use in all these papers of a discretised Langevin diffusion (a.k.a., random walk with a drift induced by the gradient of the log-target) without the original Metropolis correction. The results rely on an assumption of [strong?] log-concavity of the target, with “user-friendly” bounds on the Wasserstein distance depending on the constants appearing in this log-concavity constraint. And so does the adaptive step. (In the case of the noisy version, the bias and variance of the noise also matter. As pointed out by the authors, there is still applicability to scaling MCMC for large samples. Beyond pseudo-marginal situations.)

“…this, at first sight very disappointing behavior of the LMC algorithm is, in fact, continuously connected to the exponential convergence of the gradient descent.”

The paper concludes with an interesting mise en parallèle of Langevin algorithms and of gradient descent algorithms, since the convergence rates are the same.

## the penalty method

Posted in Statistics, University life with tags bias, Euro 2016, exchange algorithm, football, Hastings-Metropolis sampler, Monte Carlo Statistical Methods, path sampling, PhD students, pseudo-marginal MCMC, unbiased estimation, Université Paris Dauphine on July 7, 2016 by xi'an

“In this paper we will make conceptually simple generalization of Metropolis algorithm, by adjusting the acceptance ratio formula so that the transition probabilities are unaffected by the fluctuations in the estimate of [the acceptance ratio]…”

**L**ast Friday, in Paris-Dauphine, my PhD student Changye Wu showed me a paper of Ceperley and Dewing entitled the penalty method for random walks with uncertain energies. Of which I was unaware of (and which alas pre-dated a recent advance made by Changye). Despite its physics connections, the paper is actually about estimating a Metropolis-Hastings acceptance ratio and correcting the Metropolis-Hastings move for this estimation. While there is no generic solution to this problem, assuming that the logarithm of the acceptance ratio estimate is Gaussian around the true log acceptance ratio (and hence *unbiased*) leads to a log-normal correction for the acceptance probability.

“Unfortunately there is a serious complication: the variance needed in the noise penalty is also unknown.”

Even when the Gaussian assumption is acceptable, there is a further issue with this approach, namely that it also depends on an unknown variance term. And replacing it with an estimate induces further bias. So it may be that this method has not met with many followers because of those two penalising factors. Despite precluding the pseudo-marginal approach of Mark Beaumont (2003) by a few years, with the later estimating separately numerator and denominator in the Metropolis-Hastings acceptance ratio. And hence being applicable in a much wider collection of cases. Although I wonder if some generic approaches like path sampling or the exchange algorithm could be applied on a generic basis… *[I just realised the title could be confusing in relation with the current football competition!]*

## an extension of nested sampling

Posted in Books, Statistics, University life with tags arXiv, extreme value theory, Hastings-Metropolis sampler, Metropolis-Hastings algorithms, nested sampling, Poisson process, rare events, unbiasedness on December 16, 2014 by xi'an**I** was reading [in the Paris métro] *Hastings-Metropolis algorithm on Markov chains for small-probability estimation*, arXived a few weeks ago by François Bachoc, Lionel Lenôtre, and Achref Bachouch, when I came upon their first algorithm that reminded me much of nested sampling: the following was proposed by Guyader et al. in 2011,

To approximate a tail probability

P(H(X)>h),

- start from an iid sample of size N from the reference distribution;
- at each iteration m, select the point x with the smallest H(x)=ξ and replace it with a new point y simulated under the constraint H(y)≥ξ;
- stop when all points in the sample are such that H(X)>h;
- take
as the unbiased estimator of

P(H(X)>h).

Hence, except for the stopping rule, this is the same implementation as nested sampling. Furthermore, Guyader et al. (2011) also take advantage of the bested sampling fact that, if direct simulation under the constraint H(y)≥ξ is infeasible, simulating via one single step of a Metropolis-Hastings algorithm is as valid as direct simulation. (I could not access the paper, but the reference list of Guyader et al. (2011) includes both original papers by John Skilling, so the connection must be made in the paper.) What I find most interesting in this algorithm is that it even achieves unbiasedness (even in the MCMC case!).

## MCMC on zero measure sets

Posted in R, Statistics with tags conditional density, Hastings-Metropolis sampler, Jacobian, MCMC, measure theory, measure zero set, projected measure, random walk on March 24, 2014 by xi'an**S**imulating a bivariate normal under the constraint (or conditional to the fact) that x²-y²=1 (a non-linear zero measure curve in the 2-dimensional Euclidean space) is not that easy: if running a random walk along that curve (by running a random walk on y and deducing x as x²=y²+1 and accepting with a Metropolis-Hastings ratio based on the bivariate normal density), the outcome differs from the target predicted by a change of variable and the proper derivation of the conditional. The *above* graph resulting from the R code *below* illustrates the discrepancy!

targ=function(y){ exp(-y^2)/(1.52*sqrt(1+y^2))} T=10^5 Eps=3 ys=xs=rep(runif(1),T) xs[1]=sqrt(1+ys[1]^2) for (t in 2:T){ propy=runif(1,-Eps,Eps)+ys[t-1] propx=sqrt(1+propy^2) ace=(runif(1)<(dnorm(propy)*dnorm(propx))/ (dnorm(ys[t-1])*dnorm(xs[t-1]))) if (ace){ ys[t]=propy;xs[t]=propx }else{ ys[t]=ys[t-1];xs[t]=xs[t-1]}}

If instead we add the proper Jacobian as in

ace=(runif(1)<(dnorm(propy)*dnorm(propx)/propx)/ (dnorm(ys[t-1])*dnorm(xs[t-1])/xs[t-1]))

the fit is there. My open question is how to make this derivation generic, i.e. without requiring the (dreaded) computation of the (dreadful) Jacobian.

## new MCMC algorithm for Bayesian variable selection

Posted in pictures, Statistics, Travel, University life with tags Bayesian model choice, Bayesian variable selection, Hastings-Metropolis sampler, Langevin diffusion, Langevin MCMC algorithm, Markov chain Monte Carlo, Monte Carlo Statistical Methods, shrinkage estimation, simulation, variable dimension models on February 25, 2014 by xi'an**U**nfortunately, I will miss the incoming Bayes in Paris seminar next Thursday (27th February), as I will be flying to Montréal and then Québec at the time (despite having omitted to book a flight till now!). Indeed Amandine Shreck will give a talk at 2pm in room 18 of ENSAE, Malakoff, on *A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection*, a work written jointly with Gersende Fort, Sylvain Le Corff, and Eric Moulines, and arXived at the end of 2013 (which may explain why I missed it!). Here is the abstract:

This paper introduces a new Markov Chain Monte Carlo method to perform Bayesian variable selection in high dimensional settings. The algorithm is a Hastings-Metropolis sampler with a proposal mechanism which combines (i) a Metropolis adjusted Langevin step to propose local moves associated with the differentiable part of the target density with (ii) a shrinkage-thresholding step based on the non-differentiable part of the target density which provides sparse solutions such that small components are shrunk toward zero. This allows to sample from distributions on spaces with different dimensions by actually setting some components to zero. The performances of this new procedure are illustrated with both simulated and real data sets. The geometric ergodicity of this new transdimensional Markov Chain Monte Carlo sampler is also established.

(I will definitely get a look at the paper over the coming days!)