## single variable transformation approach to MCMC

Posted in Books, Statistics, Travel with tags , , , , on September 9, 2014 by xi'an

I read the newly arXived paper “On Single Variable Transformation Approach to Markov Chain Monte Carlo” by Dey and Bhattacharya on the pleasant train ride between Bristol and Coventry last weekend. The paper actually follows several earlier papers by the authors that I have not read in detail. The notion of single variable transform is to add plus or minus the same random noise to all components of the current value of the Markov chain, instead of the standard d-dimensional random walk proposal of the reference Metropolis-Hastings algorithm, namely all proposals are of the form

$x_i'=x_i\pm \epsilon\ i=1,\cdots,d$

meaning the chain proceeds [after acceptance] along one and only one of the d diagonals. The authors’ arguments are that (a) the proposal is cheaper and (b) the acceptance rate is higher. What I find questionable in this argument is that this does not directly matter in the evaluation of the performances of the algorithm. For instance, higher acceptance in a Metropolis-Hasting algorithm does not imply faster convergence and smaller asymptotic variance. (This goes without mentioning the fact that the comparative Figure 1 is so variable with the dimension as to be of limited worth. Figure 1 and 2 are also found in an earlier arXived paper of the authors.) For instance, restricting the moves along the diagonals of the Euclidean space implies that there is a positive probability to make two successive proposals along the same diagonal, which is a waste of time. When considering the two-dimensional case, joining two arbitrary points using an everywhere positive density g upon ε means generating two successive values from g, which is equivalent cost-wise to generating a single noise from a two-dimensional proposal. Without the intermediate step of checking the one-dimensional move along one diagonal. So much for a gain. In fine, the proposal found in this paper sums up as being a one-at-a-time version of a standard random walk Metropolis-Hastings algorithm.

## austerity in MCMC land (#2)

Posted in R, Statistics with tags , , , on April 29, 2013 by xi'an

After reading the arXiv paper by Korattikara, Chen and Welling, I wondered about the expression of the acceptance step of the Metropolis-Hastings algorithm as a mean of log-likelihoods over the sample. More specifically the long sleepless nights at the hospital led me to ponder the rather silly question of the impact of replacing mean by median. I thus tried running a Metropolis-Hastings algorithm with the substitute and it (of course!) let to a nonsensical answer, as shown by the above graph. The true posterior is the one for a normal model and the histogram indicates a lack of convergence of the Markov chain to this posterior even though it does converge to some posterior. Here is the R code for this tiny experiment:

#data generation
N=100
x=rnorm(N)

#HM steps
T=10^5
theta=rep(0,T)
curlike=dnorm(x,log=TRUE)
for (t in 2:T){

prop=theta[t-1]+.1*rnorm(1)
proplike=dnorm(x,mean=prop,log=TRUE)
u=runif(1)
bound=log(u)-dnorm(prop,sd=10,log=TRUE)+
dnorm(theta[t-1],sd=10,log=TRUE)
if (median(proplike)-median(curlike)>bound/N){
theta[t]=prop;curlike=proplike
} else { theta[t]=theta[t-1]}
}


## making a random walk geometrically ergodic

Posted in R, Statistics with tags , , , , , , , , , on March 2, 2013 by xi'an

While a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, AoS, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent Annals of Statistics paper, Leif Johnson and Charlie Geyer designed a trick to turn a random walk Metropolis-Hastings algorithm into a geometrically ergodic random walk Metropolis-Hastings algorithm by virtue of an isotropic transform (under the provision that the original target density has a moment generating function). This theoretical result is complemented by an R package called mcmc. (I have not tested it so far, having read the paper in the métro.) The examples included in the paper are however fairly academic and I wonder how the method performs in practice, on truly complex models, in particular because the change of variables relies on (a) an origin and (b) changing the curvature of space uniformly in all dimensions. Nonetheless, the idea is attractive and reminds me of a project of ours with Randal Douc,  started thanks to the ‘Og and still under completion.

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 14, 2012 by xi'an

This week, my student Dona Skanji gave a presentation of the paper of Hastings “Monte Carlo sampling methods using Markov chains and their applications“, which set the rules for running MCMC algorithms, much more so than the original paper by Metropolis et al. which presented an optimisation device. even though the latter clearly stated the Markovian principle of those algorithms and their use for integration. (This is definitely a classic, selected in the book Biometrika: One hundred years, by Mike Titterington and David Cox.) Here are her slides (the best Beamer slides so far!):

Given that I had already taught my lectures on Markov chains and on MCMC algorithms, the preliminary part of Dona’s talk was easier to compose and understanding the principles of the method was certainly more straightforward than for the other papers in the series. I think she nonetheless did a rather good job in summing up the paper, running this extra simulation for the Poisson distribution—with the interesting “mistake” of including the burnin time in the representation of the output and concluding about a poor convergence—and mentioning the Gibbs extension.I led the discussion of the seminar towards irreducibility conditions and Peskun’s ordering of Markov chains, which maybe could have been mentioned by Dona since she was aware Peskun was Hastings‘ student.

## A survey of [the 60’s] Monte Carlo methods

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 17, 2011 by xi'an

“The only good Monte Carlos are the dead Monte Carlos” (Trotter and Tukey, quoted by Halton)

When I presented my [partial] history of MCM methods in Bristol two months ago, at the Julian Besag memorial, Christophe Andrieu mentioned a 1970 SIAM survey by John Halton on A retrospective and prospective survey of the Monte Carlo method. This is a huge paper (62 pages, 251 references) and it brings a useful light on the advances in the 60’s (the paper was written in 1968). From the reference list, it seems John Halton was planning two books on the Monte Carlo method, but a search on google did not show anything. I also discovered in this list that there was a 1954 RSS symposium (Read Paper?) on Monte Carlo methods. Note that there were at least two books on Monte Carlo published at the time, Hammersley and Handscomb’s 1964 Monte Carlo Methods and Scheider’s 1966 Monte Carlo Method. (Hammerlsey appears as a major figure in this survey.) There is a lot of material in this long review and most of the standard methods are listed: control variate, importance sampling, self-normalised simportance sampling, stratified sampling, antithetic variables, simulation by inversion, rejection or demarginalisation. Variance reduction is presented as the motivation for the alternative methods. Very little is said about the use of Monte Carlo methods in statistics (“many of  [the applications] are primitive and artless“)  I was first very surprised to find sequential Monte Carlomentioned as well, but it later appeared this was Monte Carlo methods for sequential problems, in the spirit of Abraham Wald. While the now forgotten EZH method is mentioned as a promising new method (p.11), the survey also contains an introduction to the conditional Monte Carlo method of Trotter and Tukey (1956) [from whom the above and rather puzzling quote is taken] that could relate to the averaging techniques of Kong, McCullagh, Meng, Nicolae and Tan as found in their 2003 Read Paper….

“The search for randomness is evidently futile” (Halton)

A large part of the review is taken by the study of uniform random generators and by the distinction between random, pseudo-random and quasi-random versions. Halton insists very much on the lack of justification in using non-random generators, even though they work well. He even goes further as to warn about bias because even the truly random generators are discrete. The book covers the pseudo-random generators, starting with the original version of von Neumann, Metropolis, and Ulam, continuing with Lehmer’s well-known congruencial generator, and the Fibonacci generalisation. For testing those generators by statistical tests (again with little theoretical ground), Marsaglia is mentioned.  The paper also covers in great detail the quasi-random sequences, covering low discrepancy requirements, van der Corput’s, Halton’s and Hammersley’s sequences. Halton considers quasi-Monte Carlo as “a branch of numerical analysis”.

The paper concludes with a list of 24 future developments I will cover in another post tomorrow…

## MCMC with errors

Posted in R, Statistics, University life with tags , , , , , , , on March 25, 2011 by xi'an

I received this email last week from Ian Langmore, a postdoc in Columbia:

I’m looking for literature on a subject and can’t find it:  I have a Metropolis sampler where the acceptance probability is evaluated with some error.  This error is not simply error in evaluation of the target density.  It occurs due to the method by which we approximate the acceptance probability.

This is a sensible question, albeit a wee vague… The closest item of work I can think of is the recent paper by Christophe Andrieu and Gareth Roberts,  in the Annals of Statistics (2009) following an original proposal by Marc Beaumont. I think there is an early 1990’s paper by Gareth and Jeff Rosenthal where they consider the impact of some approximation effect like real number representation on the convergence but I cannot find it. Of course, the recent particle MCMC JRSS B discussion paper by Christophe,  Arnaud Doucet and Roman Hollenstein is a way to bypass the problem. (In a sense ABC is a rudimentary answer as well.) And there must be many other papers on this topic I am not aware of….

## Vanilla on-line

Posted in Statistics, University life with tags , , , , on February 18, 2011 by xi'an

The Vanilla Rao–Blackwellization of Metropolis–Hastings algorithms paper with Randal Douc is now published in Annals of Statistics (Volume 39, Number 1 (2011), pages 261-277) and available on-line via the project Euclid. We are currently working with Pierre Jacob on an extension of this idea towards parallelisation