Archive for parallel tempering

21w5107 [day 1]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on November 30, 2021 by xi'an

The workshop started by the bad news of our friend Michele Guindani being hit and mugged upon arrival in Oaxaca, Saturday night. Fortunately, he was not hurt, but lost both phone and wallet, always a major bummer when abroad… Still this did not cast a lasting pall on the gathering of long-time no-see friends, whom I had indeed not seen for at least two years. Except for those who came to the CIRMirror!

A few hours later, we got woken up by fairly loud firecrackers (palomas? cohetes?) at 5am, for no reason I can fathom (the Mexican Revolution day was a week ago) although it seemed correlated with the nearby church bells going on at full blast (for Lauds? Hanukkah? Cyber Monday? Chirac’s birthdate?). The above picture was taken the Santa María del Tule town with its super-massive Montezuma cypress tree, with remaining decorations from the Día de los Muertos.

Without launching (much) the debate on whether or not Bayesian non-parametrics qualified as “objective Bayesian” methods, Igor Prünster started the day with a non-parametric presentation of dependent random probability measures. With the always fascinating notion that a random discrete non-parametric prior is inducing a distribution on the partitions (EPPF). And applicability in mixtures and their generalisations. Realising that the highly discrete nature of such measures is not such an issue for a given sample size n, since there are at most n elements in the partition. Beatrice Franzolini discussed of specific ways to create dependent distributions based on independent samples, although her practical example based on one N(-10,1) sample and another (independently) N(10,1) sample seemed to fit in several of the dependent random measures she compared. And Marta Catalano (Warwick) presented her work on partial exchangeability and optimal transportation (which I had also heard in CIRM last June and in Warwick last week). One thing I had not realised earlier was the dependence of the Wasserstein distance on the parameterisation, although it now makes perfect sense. If only for the coupling.  I had alas to miss Isadora Antoniano-Villalobos’ talk as I had to teach my undergrad class in Paris Dauphine at the same time… This non-parametric session was quite homogeneous and rich in perspectives.

In an all-MCMC afternoon, Julyan Arbel talked about reference priors for extreme value distributions, with the “shocking” case of a restriction on the support of one parameter, ξ. Which means in fact that the Jeffreys prior is then undefined. This reminded me somewhat of the work of Clara Grazian on Jeffreys priors for mixtures, where some models were not allowing for Fisher information to exist. The second part of this talk was about modified local versions of Gelman & Rubin (1992) R hats. And the recent modification proposed by Aki and co-authors. Where I thought that a simplification of the multivariate challenge of defining ranks could be alleviated by considering directly the likelihood values of the chains. And Trevor Campbell gradually built an involved parallel tempering method where the powers of a geometric mixture are optimised as spline functions of the temperature. Next, María Gil-Leyva presented her original and ordered approach to mixture estimation, which I discussed in a blog published two days ago (!). She corrected my impressions that (i) the methods were all impervious to label switching and (ii) required some conjugacy to operate. The final talk of the day was by Anirban Bhattacharya on high-D Bayesian regression and coupling techniques for checking convergence, a paper that had been on my reading list for a long while. A very elaborate construct of coupling strategies within a Gibbs sampler, with some steps relying on optimal coupling and others on the use of common random generators.

parallel tempering on optimised paths

Posted in Statistics with tags , , , , , , , , , , , , , , , on May 20, 2021 by xi'an


Saifuddin Syed, Vittorio Romaniello, Trevor Campbell, and Alexandre Bouchard-Côté, whom I met and discussed with on my “last” trip to UBC, on December 2019, just arXived a paper on parallel tempering (PT), making the choice of tempering path an optimisation problem. They address the touchy issue of designing a sequence of tempered targets when the starting distribution π⁰, eg the prior, and the final distribution π¹, eg the posterior, are hugely different, eg almost singular.

“…theoretical analysis of reversible variants of PT has shown that adding too many intermediate chains can actually deteriorate performance (…) [while] on non reversible regime adding more chains is guaranteed to improve performances.”

The above applies to geometric combinations of π⁰ and π¹. Which “suffers from an arbitrarily suboptimal global communication barrier“, according to the authors (although the counterexample is not completely convincing since π⁰ and π¹ share the same variance). They propose a more non-linear form of tempering with constraints on the dependence of the powers on the temperature t∈(0,1).  Defining the global communication barrier as an average over temperatures of the rejection rate, the path characteristics (e.g., the coefficients of a spline function) can then be optimised in terms of this objective. And the temperature schedule is derived from the fact that the non-asymptotic round trip rate is maximized when the rejection rates are all equal. (As a side item, the technique exposed in the earlier tempering paper by Syed et al. was recently exploited for a night high resolution imaging of a black hole from the M87 galaxy.)

ABC, anytime!

Posted in Books, pictures, Statistics, Travel, University life with tags , , , on January 18, 2021 by xi'an

Last June, Alix Marie d’Avigneau, Sumeet Singh, and Lawrence Murray arXived a paper on anytime ABC I intended to review right away but that sat till now on my virtual desk (and pile of to-cover-arXivals!). The notion of anytime MCMC was already covered in earlier ‘Og entries, but this anytime ABC version bypasses the problem of asynchronicity, namely, “randomly varying local move completion times when parallel tempering is implemented on a multi-processor computing resource”. The different temperatures are replaced by different tolerances in ABC. Since switches between tolerances are natural if a proposal for a given tolerance ε happens to be eligible for a lower tolerance ε’. And accounting for the different durations required to simulate a proposal under different tolerances to avoid the induced bias in the stationary distributions. Or the wait for other processors to complete their task. A drawback with the approach stands in calibrating the tolerance levels in advance (or via preliminary runs that may prove costly).

general perspective on the Metropolis–Hastings kernel

Posted in Books, Statistics with tags , , , , , , , , , , , , , on January 14, 2021 by xi'an

[My Bristol friends and co-authors] Christophe Andrieu, and Anthony Lee, along with Sam Livingstone arXived a massive paper on 01 January on the Metropolis-Hastings kernel.

“Our aim is to develop a framework making establishing correctness of complex Markov chain Monte Carlo kernels a purely mechanical or algebraic exercise, while making communication of ideas simpler and unambiguous by allowing a stronger focus on essential features (…) This framework can also be used to validate kernels that do not satisfy detailed balance, i.e. which are not reversible, but a modified version thereof.”

A central notion in this highly general framework is, extending Tierney (1998), to see an MCMC kernel as a triplet involving a probability measure μ (on an extended space), an involution transform φ generalising the proposal step (i.e. þ²=id), and an associated acceptance probability ð. Then μ-reversibility occurs for

\eth(\xi)\mu(\text{d}\xi)= \eth(\phi(\xi))\mu^{\phi}(\text{d}\xi)

with the rhs involving the push-forward measure induced by μ and φ. And furthermore there is always a choice of an acceptance probability ð ensuring for this equality to happen. Interestingly, the new framework allows for mostly seamless handling of more complex versions of MCMC such as reversible jump and parallel tempering. But also non-reversible kernels, incl. for instance delayed rejection. And HMC, incl. NUTS. And pseudo-marginal, multiple-try, PDMPs, &c., &c. it is remarkable to see such a general theory emerging a this (late?) stage of the evolution of the field (and I will need more time and attention to understand its consequences).

QuanTA

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , on September 17, 2018 by xi'an

My Warwick colleagues Nick Tawn [who also is my most frequent accomplice to running, climbing and currying in Warwick!] and Gareth Robert have just arXived a paper on QuanTA, a new parallel tempering algorithm that Nick designed during his thesis at Warwick, which he defended last semester. Parallel tempering targets in parallel several powered (or power-tempered) versions of the target distribution. With proposed switches between adjacent targets. An improved version transforms the local values before operating the switches. Ideally, the transform should be the composition of the cdf and inverse cdf, but this is impossible. Linearising the transform is feasible, but does not agree with multimodality, which calls for local transforms. Which themselves call for the identification of the different modes. In QuanTA, they are identified by N parallel runs of the standard, or rather N/2 to avoid dependence issues, and K-means estimates. The paper covers the construction of an optimal scaling of temperatures, in that the difference between the temperatures is scaled [with order 1/√d] so that the acceptance rate for swaps is 0.234. Which in turns induces a practical if costly calibration of the temperatures, especially when the size of the jump is depending on the current temperature. However, this cost issue is addressed in the paper, resorting to the acceptance rate as a proxy for effective sample size and the acceptance rate over run time to run the comparison with regular parallel tempering, leading to strong improvements in the mixture examples examined in the paper. The use of machine learning techniques like K-means or more involved solutions is a promising thread in this exciting area of tempering, where intuition about high temperatures can be actually misleading. Because using the wrong scale means missing the area of interest, which is not the mode!

%d bloggers like this: