Archive for parallel tempering

ABC, anytime!

Posted in Books, pictures, Statistics, Travel, University life with tags , , , on January 18, 2021 by xi'an

Last June, Alix Marie d’Avigneau, Sumeet Singh, and Lawrence Murray arXived a paper on anytime ABC I intended to review right away but that sat till now on my virtual desk (and pile of to-cover-arXivals!). The notion of anytime MCMC was already covered in earlier ‘Og entries, but this anytime ABC version bypasses the problem of asynchronicity, namely, “randomly varying local move completion times when parallel tempering is implemented on a multi-processor computing resource”. The different temperatures are replaced by different tolerances in ABC. Since switches between tolerances are natural if a proposal for a given tolerance ε happens to be eligible for a lower tolerance ε’. And accounting for the different durations required to simulate a proposal under different tolerances to avoid the induced bias in the stationary distributions. Or the wait for other processors to complete their task. A drawback with the approach stands in calibrating the tolerance levels in advance (or via preliminary runs that may prove costly).

general perspective on the Metropolis–Hastings kernel

Posted in Books, Statistics with tags , , , , , , , , , , , , , on January 14, 2021 by xi'an

[My Bristol friends and co-authors] Christophe Andrieu, and Anthony Lee, along with Sam Livingstone arXived a massive paper on 01 January on the Metropolis-Hastings kernel.

“Our aim is to develop a framework making establishing correctness of complex Markov chain Monte Carlo kernels a purely mechanical or algebraic exercise, while making communication of ideas simpler and unambiguous by allowing a stronger focus on essential features (…) This framework can also be used to validate kernels that do not satisfy detailed balance, i.e. which are not reversible, but a modified version thereof.”

A central notion in this highly general framework is, extending Tierney (1998), to see an MCMC kernel as a triplet involving a probability measure μ (on an extended space), an involution transform φ generalising the proposal step (i.e. þ²=id), and an associated acceptance probability ð. Then μ-reversibility occurs for

\eth(\xi)\mu(\text{d}\xi)= \eth(\phi(\xi))\mu^{\phi}(\text{d}\xi)

with the rhs involving the push-forward measure induced by μ and φ. And furthermore there is always a choice of an acceptance probability ð ensuring for this equality to happen. Interestingly, the new framework allows for mostly seamless handling of more complex versions of MCMC such as reversible jump and parallel tempering. But also non-reversible kernels, incl. for instance delayed rejection. And HMC, incl. NUTS. And pseudo-marginal, multiple-try, PDMPs, &c., &c. it is remarkable to see such a general theory emerging a this (late?) stage of the evolution of the field (and I will need more time and attention to understand its consequences).


Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , on September 17, 2018 by xi'an

My Warwick colleagues Nick Tawn [who also is my most frequent accomplice to running, climbing and currying in Warwick!] and Gareth Robert have just arXived a paper on QuanTA, a new parallel tempering algorithm that Nick designed during his thesis at Warwick, which he defended last semester. Parallel tempering targets in parallel several powered (or power-tempered) versions of the target distribution. With proposed switches between adjacent targets. An improved version transforms the local values before operating the switches. Ideally, the transform should be the composition of the cdf and inverse cdf, but this is impossible. Linearising the transform is feasible, but does not agree with multimodality, which calls for local transforms. Which themselves call for the identification of the different modes. In QuanTA, they are identified by N parallel runs of the standard, or rather N/2 to avoid dependence issues, and K-means estimates. The paper covers the construction of an optimal scaling of temperatures, in that the difference between the temperatures is scaled [with order 1/√d] so that the acceptance rate for swaps is 0.234. Which in turns induces a practical if costly calibration of the temperatures, especially when the size of the jump is depending on the current temperature. However, this cost issue is addressed in the paper, resorting to the acceptance rate as a proxy for effective sample size and the acceptance rate over run time to run the comparison with regular parallel tempering, leading to strong improvements in the mixture examples examined in the paper. The use of machine learning techniques like K-means or more involved solutions is a promising thread in this exciting area of tempering, where intuition about high temperatures can be actually misleading. Because using the wrong scale means missing the area of interest, which is not the mode!

love-hate Metropolis algorithm

Posted in Books, pictures, R, Statistics, Travel with tags , , , , , , , , , on January 28, 2016 by xi'an

Hyungsuk Tak, Xiao-Li Meng and David van Dyk just arXived a paper on a multiple choice proposal in Metropolis-Hastings algorithms towards dealing with multimodal targets. Called “A repulsive-attractive Metropolis algorithm for multimodality” [although I wonder why XXL did not jump at the opportunity to use the “love-hate” denomination!]. The proposal distribution includes a [forced] downward Metropolis-Hastings move that uses the inverse of the target density π as its own target, namely 1/{π(x)+ε}. Followed by a [forced] Metropolis-Hastings upward move which target is {π(x)+ε}. The +ε is just there to avoid handling ratios of zeroes (although I wonder why using the convention 0/0=1 would not work). And chosen as 10⁻³²³ by default in connection with R smallest positive number. Whether or not the “downward” move is truly downwards and the “upward” move is truly upwards obviously depends on the generating distribution: I find it rather surprising that the authors consider the same random walk density in both cases as I would have imagined relying on a more dispersed distribution for the downward move in order to reach more easily other modes. For instance, the downward move could have been based on an anti-Langevin proposal, relying on the gradient to proceed further down…

This special choice of a single proposal however simplifies the acceptance ratio (and keeps the overall proposal symmetric). The final acceptance ratio still requires a ratio of intractable normalising constants that the authors bypass by Møller et al. (2006) auxiliary variable trick. While the authors mention the alternative pseudo-marginal approach of Andrieu and Roberts (2009), they do not try to implement it, although this would be straightforward here: since the normalising constants are the probabilities of accepting a downward and an upward move, respectively. Those can easily be evaluated at a cost similar to the use of the auxiliary variables. That is,

– generate a few moves from the current value and record the proportion p of accepted downward moves;
– generate a few moves from the final proposed value and record the proportion q of accepted downward moves;

and replace the ratio of intractable normalising constants with p/q. It is not even clear that one needs those extra moves since the algorithm requires an acceptance in the downward and upward moves, hence generate Geometric variates associated with those probabilities p and q, variates that can be used for estimating them. From a theoretical perspective, I also wonder if forcing the downward and upward moves truly leads to an improved convergence speed. Considering the case when the random walk is poorly calibrated for either the downward or upward move, the number of failed attempts before an acceptance may get beyond the reasonable.

As XXL and David pointed out to me, the unusual aspect of the approach is that here the proposal density is intractable, rather than the target density itself. This makes using Andrieu and Roberts (2009) seemingly less straightforward. However, as I was reminded this afternoon at the statistics and probability seminar in Bristol, the argument for the pseudo-marginal based on an unbiased estimator is that w Q(w|x) has a marginal in x equal to π(x) when the expectation of w is π(x). In thecurrent problem, the proposal in x can extended into a proposal in (x,w), w P(w|x) whose marginal is the proposal on x.

If we complement the target π(x) with the conditional P(w|x), the acceptance probability would then involve

{π(x’) P(w’|x’) / π(x) P(w|x)} / {w’ P(w’|x’) / w P(w|x)} = {π(x’) / π(x)} {w/w’}

so it seems the pseudo-marginal (or auxiliary variable) argument also extends to the proposal. Here is a short experiment that shows no discrepancy between target and histogram:

#love-hate move
  while (runif(1)>{pi(x)+nozero}/{pi(prop1)+nozero}){ 
  while (runif(1)>{pi(prop2)+nozero}/{pi(prop1)+nozero}) 
  if (runif(1)<pi(prop2)*bacwa/pi(x)/fowa){ 
#arbitrary bimodal target
#running the chain
fowa=1;prop1=rnorm(1,x,2) #initial estimate
  while (runif(1)>{pi(x)+nozero}/{pi(prop1)+nozero}){
for (t in 2:T)

A convective replica-exchange method for sampling new energy basins

Posted in Books, Statistics, University life with tags , , , , , , , , on September 18, 2013 by xi'an

I was recently asked to referee a PhD thesis at Institut Pasteur, written by Yannick Spill, in Bioinformatics which essentially focus on statistical and computational aspects. (Hence the request. The defence itself was today.) Among the several papers included in the thesis, there was this post-worthy paper called A convective replica-exchange method for sampling new energy basins written by Spill, Bouvier and Nilges and published in the Journal of Computational Chemistry, paper that extends the usual multiple tempering algorithm in a method called the convective replica exchange algorithm. Rather than selecting the chain to exchange at random, it forces a given chain to go through all temperature steps in a cyclic manner and only moves to another chain when the cycle is over. There is a delayed rejection flavour therein in that the method does not let go: it keeps proposing moves to the next level until one is accepted (hence earning from me the nickname of the pit-bull algorithm!). While the thesis includes a (corrected) proof of ergodicity of the method, I wonder if the proof could have been directly derived from a presentation of the algorithm as a reversible jump MCMC type algorithm (Green, 1995). The experiments presented in the paper are quite worthwhile in that they allayed my worries that this new version of replica exchange could be much slower than the original one, because of the constraint of moving a given chain to a given neighbouring level rather than picking those two entities at random. (Obviously, the performances would get worse with a sparser collection of inverse temperatures.) I am still uncertain however as to why compelling the exchange to take place in this heavily constrained way induces better dynamics for moving around the sampling space, my reasoning being that the proposals remain the same. However during the defence I realised that waiting for a switch would induce visiting more of the tails…