Archive for tempering

ABC-MCMC for parallel tempering

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on February 9, 2012 by xi'an

In this paper a new algorithm combining population-based MCMC methods with ABC requirements is proposed, using an analogy with the Parallel Tempering algorithm (Geyer, 1991).

Another of those arXiv papers that had sat on my to-read pile for way too long: Likelihood-free parallel tempering by Meïli Baragtti, Agnès Grimaud, and Denys Pommeret, from Luminy, Marseilles. The paper mentions our population Monte Carlo (PMC) algorithm (Beaumont et al., 2009) and other ABC-SMC algorithms, but opts instead for an ABC-MCMC basis. The purpose is to build a parallel tempering method. Tolerances and temperatures evolve simultaneously. I however fail to see where the tempering occurs in the algorithm (page 7): there is a set of temperatures T1,….,TN, but they do not appear within the algorithm. My first idea of a tempering mechanism in a likelihood-free setting was to replicate our SAME algorithm (Doucet, Godsill, and Robert, 2004), by creating Tj copies of the [pseudo-]observations to mimic the likelihood taken to the power Tj. But this is annealing, not tempering, and I cannot think of the opposite of copies of the data. Unless of course a power of the likelihood can be simulated (and even then, what would the equivalent be for the data…?) Maybe a natural solution would be to operate some kind of data-attrition, e.g. by subsampling the original vector of observations.

Discussing the issue with Jean-Michel Marin, during a visit to Montpellier today, I realised that the true tempering came from the tolerances εi, while the temperatures Tj were there to calibrate the proposal distributions. And that the major innovation contained in the thesis (if not so clearly in the paper) was to boost exchanges between different tolerances, improving upon the regular ABC-MCMC sampler by an equi-energy move.

workshop in Columbia

Posted in Statistics, Travel, University life with tags , , , , , , , , , on September 24, 2011 by xi'an

The workshop in Columbia University on Computational Methods in Applied Sciences is quite diverse in its topics.  Reminding me of the conference on Efficient Monte Carlo in Sandbjerg Estate, Sønderborg in 2008, celebrating the 70th birthday of Reuven Rubinstein, incl. some colleagues I had not met since this meeting. Yesterday I thus heard (quite interesting) talks on domains somehow far from my own, from Robert Adler on cohomology (giving a second look  at the thing after the talk I head in Wharton last year), to José Blanchet on simulation for infinite server queues (with a link to perfect sampling I could not exactly trace but that was certainly there). Several of the talks made me think of our Brownian motion confidence band paper, with Wilfrid Kendall and Jean-Michel Marin, esp. Gennady Samorodnitsky’s on the maximum of stochastic processes (and wonder whether we could have gone further in that direction). Pierre Del Moral presented a broad overview of the Feynman-Kacs’ approaches to particle methods, in particular particle MCMC, with application to some financial objects. Paul Glasserman talked about robust MCMC, which I found quite an appealing concept in that it included uncertainties about the model itself. And linked with minimax concepts. And Paul Dupuis exposed a parallel tempering method linked with large deviations, whose paper I am definitely looking forward. Now it is more than time to work on my own talk! (On a very personal basis, I sadly lost my sturdy Canon camera in the taxi from the airport! Will need a new one for the ‘Og!)

computational difficulties [with notations]

Posted in R, Statistics, University life with tags , , , , on August 25, 2011 by xi'an

Here is an email I received from Umberto:

I have a doubt regarding the tempered transitions method you considered in your JASA article with Celeux and Hurn.

On page 961 you detail the several steps for building a proposal for a given distribution by simulating through l tempered power densities. I am slightly confused regarding the interpretation of your MCMC(x,π) notation.

For example does MCMC(y_l,\pi^{1/\beta_{l-1}}) means that an MCMC procedure starting at yl, say Metropolis-Hastings, is used to generate a single proposal yl+1 for \pi^{1/\beta_{l-1}} ?

If this is the case, then yl+1 might be rejected or accepted and in the former case I would have yl+1=yl right? In other words I am not required to simulate proposals using MCMC(y_l,\pi^{1/\beta_{l-1}}) until I finally accept yl+1.

By reading the last paragraph in page 962 it seems to me that, indeed, the y1,…,y2l-1 thus generated are not necessarily accepted proposals for the corresponding power densities.

In retrospect, I still like this MCMC(x,π) notation in the simulated tempering “up-and-down” scheme (and the paper!). Because it is generic, in the sense of an R function that would take the function MCMC(x,π) as its input. To clarify the notation in this light, MCMC(x,π) returns a value that is the outcome of the corresponding MCMC step. This value may be equal to x (MCMC rejection) or to another value (MCMC acceptance). So the sequence y1,…,y2l-1 is made of consecutive values that differ and of consecutive values that do not (it is even possible that all the terms in the sequence are equal). At the end of this “up-and-down” tempering, the value y2l-1 may be the next value of the Markov chain targeted at the original target π. Or the current value may be replicated. This depends on the overall acceptance probability (4) on page 961. (Following Neal, 1996, Statistics and Computing.) This is a very compelling idea, whose mileage may vary depending on the number of required steps and powers.

likelihood-free parallel tempering

Posted in Statistics, University life with tags , , , , , , , , , on August 20, 2011 by xi'an

Meïli Baragatti, Agnès Grimaud, and Denys Pommeret posted an ABC paper on arXiv entitled Likelihood-free parallel tempering. While most ABC methods essentially are tempering methods, in that the tolerance level acts like an energy level, this paper uses parallel chains at various tolerance levels, with an exchange mechanism derived from Geyer and Thomson (1995, JASA). As with regular ABC-MCMC, the acceptance probability is such that the likelihood needs not be computed. On the mixture example of Sisson et al. (2007, PNAS) and on the tuberculosis example of Tanaka et al. (2006, Genetics), the authors report better performances than ABC-PMC, ABC-MCMC and ABC. (In a bimodal toy example,  ABC-PMC does not identify a second mode, which should not be the case with a large enough initial tolerance and a small enough tempering decrease step.) The paper introduces a sequence of temperatures in addition to a sequence of tolerances and it is only through an example that I understood the (unusual) role of the temperatures as scale factors in the  random walk proposal. It seems to me that an annealing step should be added as the chains with larger tolerances are less interesting as time goes on.

Ps-Scott Sisson just signaled on his twitter account the publication of several papers using ABC in monkey evolution. As well as a fourth paper by Wegman et al.. estimating the size of the initial American settlers to be around 100, about 13,000 years ago, all using standard ABC model choice techniques. Scott also pointed out a conference held in Bristol next April 16-19.

New arXiv postings

Posted in Statistics with tags , , , , on April 27, 2010 by xi'an

A series of new posting on arXiv I would like to discuss further:

  • arXiv:1004.2840 : Robust Parameter Selection for Parallel Tempering by Hamze, Dickson and Karimi
  • arXiv:1004.2910 : Valid p-Values using Importance Sampling by Harris
  • arXiv:1004.3476 : Approximate Methods for State-Space Modelsby Koyama, Castellanos, Shalizi and Kass
  • arXiv:1004.3616 : Recursive Numerical Evaluation of the Cumulative Bivariate Normal Distribution by Meyer
  • arXiv:1004.3830 : Model Selection and Adaptive Markov chain Monte Carlo for Bayesian Cointegrated VAR model by Peters, Kannan, Lasscock and Mellen
  • arXiv:1004.3925 : Classification using distance nearest neighbours by Friel and Pettitt

especially since the paper by Neal Friel and Tony Pettitt builds upon our JASA k-nearest neighbour paper.