Archive for independent Metropolis-Hastings algorithm

Metropolis gets off the ground

Posted in Books, Kids, Statistics with tags , , , , , , , on April 1, 2019 by xi'an

An X validated discussion that toed-and-froed about an incomprehension of the Metropolis-Hastings algorithm. Which started with a blame of George Casella‘s and Roger Berger’s Statistical Inference (p.254), when the real issue was the inquisitor having difficulties with the notation V ~ f(v), or the notion of random variable [generation], mistaking identically distributed with identical. Even (me) crawling from one iteration to the next did not help at the beginning. Another illustration of the strong tendency on this forum to jettison fundamental prerequisites…

adaptive independent Metropolis-Hastings

Posted in Statistics with tags , , , , , , on May 8, 2018 by xi'an

When rereading this paper by Halden et al. (2009), I was reminded of the earlier and somewhat under-appreciated Gåsemyr (2003). But I find the convergence results therein rather counter-intuitive in that they seem to justify adaptive independent proposals with no strong requirement. Besides the massive Doeblin condition:

“The Doeblin condition essentially requires that all the proposal distribution [sic] has uniformly heavier tails than the target distribution.”

Even when the adaptation is based on an history vector made of rejected values and non-replicated accepted values. Actually  convergence of this sequence of adaptive proposals kernels is established under a concentration of the Doeblin constants a¹,a²,… towards one, in the sense that

E[(1-a¹)(1-a²)…]=0.

The reason may be that, with chains satisfying a Doeblin condition, there is a probability to reach stationarity at each step. Equal to a¹, a², … And hence to ignore adaptivity since each kernel keep the target π invariant. So in the end this is not so astounding. (The paper also reminded me of Wolfgang [or Vincent] Doeblin‘s short and tragic life.)

a programming bug with weird consequences

Posted in Kids, pictures, R, Statistics, University life with tags , , , , , , on November 25, 2015 by xi'an

One student of mine coded by mistake an independent Metropolis-Hastings algorithm with too small a variance in the proposal when compared with the target variance. Here is the R code of this implementation:

#target is N(0,1)
#proposal is N(0,.01)
T=1e5
prop=x=rnorm(T,sd=.01)
ratop=dnorm(prop,log=TRUE)-dnorm(prop,sd=.01,log=TRUE)
ratav=ratop[1]
logu=ratop-log(runif(T))
for (t in 2:T){
  if (logu[t]>ratav){
    x[t]=prop[t];ratav=ratop[t]}else{x[t]=x[t-1]}
  }

It produces outputs of the following shape
smalvarwhich is quite amazing because of the small variance. The reason for the lengthy freezes of the chain is the occurrence with positive probability of realisations from the proposal with very small proposal density values, as they induce very small Metropolis-Hastings acceptance probabilities and are almost “impossible” to leave. This is due to the lack of control of the target, which is flat over the domain of the proposal for all practical purposes. Obviously, in such a setting, the outcome is unrelated with the N(0,1) target!

It is also unrelated with the normal proposal in that switching to a t distribution with 3 degrees of freedom produces a similar outcome:

It is only when using a Cauchy proposal that the pattern vanishes:

independent Metropolis-Hastings

Posted in Books, Statistics with tags , , , , , , on November 24, 2015 by xi'an

“In this paper we have demonstrated the potential benefits, both theoretical and practical, of the independence sampler over the random walk Metropolis algorithm.”

Peter Neal and Tsun Man Clement Lee arXived a paper on optimising the independent Metropolis-Hastings algorithm. I was a bit surprised at this “return” of the independent sampler, which I hardly mention in my lectures, so I had a look at the paper. The goal is to produce an equivalent to what Gelman, Gilks and Wild (1996) obtained for random walk samplers.  In the formal setting when the target is a product of n identical densities f, the optimal number k of components to update in one Metropolis-Hastings (within Gibbs) round is approximately 2.835/I, where I is the symmetrised Kullback-Leibler distance between the (univariate) target f and the independent proposal q. When I is finite. The most surprising part is that the optimal acceptance rate is again 0.234, as in the random walk case. This is surprising in that I usually associate the independent Metropolis-Hastings algorithm with high acceptance rates. But this is of course when calibrating the proposal q, not the block size k of the Gibbs part. Hence, while this calibration of the independent Metropolis-within-Gibbs sampler is worth the study and almost automatically applicable, it remains that it only applies to a certain category of problems where blocking can take place. As in the disease models illustrating the paper. And requires an adequate choice of proposal distribution for, otherwise, the above quote becomes inappropriate.

parallel Metropolis Hastings [published]

Posted in Statistics, University life with tags , , , , on October 27, 2011 by xi'an

As I was looking at the discussion paper by Yamin Yu and Xiao-Li Meng on improved efficiency for MCMC algorithms, which is available (for free) on-line, I realised the paper on parallel Metropolis-Hastings algorithm we wrote with Pierre Jacob and Murray Smith is now published in Journal of Computational and Graphical Statistics (on-line). This is a special issue for the 20th anniversary of the Journal of Computational and Graphical Statistics and our paper is within the “If Monte Carlo Be a Food of Computing, Simulate on” section! (My friends Olivier Cappé and Radu V. Craiu also have a paper in this issue.)  Here is the complete reference:

P. Jacob, C. P. Robert, & M. H. Smith. Using Parallel Computation to Improve Independent Metropolis–Hastings Based Estimation. Journal of Computational and Graphical Statistics. September 1, 2011, 20(3): 616-635. doi:10.1198/jcgs.2011.10167

The [20th Anniversary Featured Discussion] paper by Yamin Yu and Xiao-Li Meng has already been mentioned on Andrew’s blog, it is full of interesting ideas and remarks about improving Gibbs efficiency, in the spirit of the very fine work Jim Hobert and his collaborators have been developing in the past decade,  fun titles (“To center or not center – that is not the question”, “coupling is more promising than compromising”, “be all our insomnia remembered”, and “needing inception”, in connection with the talk Xiao-Li gave in Paris two months ago….), and above all the fascinating puzzle of linking statistical concepts and Monte Carlo concepts. How comes sufficiency and ancillarity are to play a role in simulation?! Where is the simulation equivalent of Basu’s theorem? These questions obviously relate to the idea of turning simulation into a measure estimation issue, discussed in a post of mine after the Columbia workshop. This interweaving paper also brings back memories of the fantastic Biometrika 1994 interleaving paper by Liu, Wong, and Kong, with its elegant proof of positive decreasing correlation and of improvement by Rao-Blackwellisation [another statistics theorem!] for data augmentation.

Parallel computation [revised]

Posted in R, Statistics, University life with tags , , , , , , on March 15, 2011 by xi'an

We have now completed our revision of the parallel computation paper and hope to send it to JCGS within a few days. As seen on the arXiv version, and given the very positive reviews we received, the changes are minor, mostly focusing on the explanation of the principle and on the argument that it comes essentially free. Pierre also redrew the graphs in a more compact and nicer way, thanks to the ggplot2 package abilities. In addition, Pierre put the R and python programs used in the paper on a public depository.

Parallel computation [permutations]

Posted in R, Statistics, University life with tags , , , , on February 20, 2011 by xi'an

François Perron is visiting me for two months from Montréal and, following a discussion about the parallel implementation of MCMC algorithms—to which he also contributed with Yves Atchadé in 2005—, he remarked that a deterministic choice of permutations with the maximal contrast should do better than random or even half-random permutations. Assuming p processors or threads, with p+1 a prime number, his solution is to take element (i,j) of the permutation table as (ij) mod (n+1): here are a few examples


> ((1:10)%*%t(1:10))%%11
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    2    3    4    5    6    7    8    9    10
 [2,]    2    4    6    8   10    1    3    5    7     9
 [3,]    3    6    9    1    4    7   10    2    5     8
 [4,]    4    8    1    5    9    2    6   10    3     7
 [5,]    5   10    4    9    3    8    2    7    1     6
 [6,]    6    1    7    2    8    3    9    4   10     5
 [7,]    7    3   10    6    2    9    5    1    8     4
 [8,]    8    5    2   10    7    4    1    9    6     3
 [9,]    9    7    5    3    1   10    8    6    4     2
[10,]   10    9    8    7    6    5    4    3    2     1

> ((1:16)%*%t(1:16))%%17
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
 [1,]    1    2    3    4    5    6    7    8    9    10    11
 [2,]    2    4    6    8   10   12   14   16    1     3     5
 [3,]    3    6    9   12   15    1    4    7   10    13    16
 [4,]    4    8   12   16    3    7   11   15    2     6    10
 [5,]    5   10   15    3    8   13    1    6   11    16     4
 [6,]    6   12    1    7   13    2    8   14    3     9    15
 [7,]    7   14    4   11    1    8   15    5   12     2     9
 [8,]    8   16    7   15    6   14    5   13    4    12     3
 [9,]    9    1   10    2   11    3   12    4   13     5    14
[10,]   10    3   13    6   16    9    2   12    5    15     8
[11,]   11    5   16   10    4   15    9    3   14     8     2
[12,]   12    7    2   14    9    4   16   11    6     1    13
[13,]   13    9    5    1   14   10    6    2   15    11     7
[14,]   14   11    8    5    2   16   13   10    7     4     1
[15,]   15   13   11    9    7    5    3    1   16    14    12
[16,]   16   15   14   13   12   11   10    9    8     7     6

which show that the scheme provides an interestingly diverse repartition of the indices. We certainly have to try this in the revision.