**A** question related to the earlier post on the first *importance sampling* in print, about the fist *Markov chain Monte Carlo* in print. Again uncovered by Charly, a 1973 Chemical Physics paper by Patey and Valleau, the latter inventing umbrella sampling with Torrie at about the same time. (In a 1972 paper in the same journal with Card, Valleau uses *Metropolis Monte Carlo*. While Hastings, also at the University of Toronto uses *Markov chain sampling.*)

## Archive for umbrella sampling

## another first

Posted in Statistics with tags Chemical Physics Letters, history of Monte Carlo, importance sampling, John Valleau, Markov chain Monte Carlo, MCMC, Metropolis algorithm, umbrella sampling, Wilfred Keith Hastings on July 1, 2022 by xi'an## [more than] everything you always wanted to know about marginal likelihood

Posted in Books, Statistics, University life with tags adaptive importance sampling, arXiv, Bayes factor, bridge sampling, candidate's formula, Charlie Geyer, Chib's approximation, CRiSM, improper prior, Julian Besag, Laplace approximation, Madrid, nested sampling, noise contrasting estimation, path sampling, reversible jump MCMC, sequential Monte Carlo, surveys, umbrella sampling, Universidad Carlos III de Madrid, University of Warwick, Warwickshire on February 10, 2022 by xi'an**E**arlier this year, F. Llorente, L. Martino, D. Delgado, and J. Lopez-Santiago have arXived an updated version of their massive survey on marginal likelihood computation. Which I can only warmly recommend to anyone interested in the matter! Or looking for a base camp to initiate a graduate project. They break the methods into four families

- Deterministic approximations (e.g., Laplace approximations)
- Methods based on density estimation (e.g., Chib’s method, aka the candidate’s formula)
- Importance sampling, including sequential Monte Carlo, with a subsection connecting with MCMC
- Vertical representations (mostly, nested sampling)

Besides sheer computation, the survey also broaches upon issues like improper priors and alternatives to Bayes factors. The parts I would have done in more details are reversible jump MCMC and the long-lasting impact of Geyer’s reverse logistic regression (with the noise contrasting extension), even though the link with bridge sampling is briefly mentioned there. There is even a table reporting on the coverage of earlier surveys. Of course, the following postnote of the manuscript

*The Christian Robert’s blog deserves a special mention , since Professor C. Robert has devoted several entries of his blog with very interesting comments regarding the marginal likelihood estimation and related topics.*

does not in the least make me less objective! Some of the final recommendations

*use of Naive Monte Carlo*[simulate from the prior]*should be always considered*[assuming a proper prior!]

*a multiple-try method is a good choice within the MCMC schemes**optimal umbrella sampling estimator is difficult and costly to implement , so its best performance may not be achieved in practice**adaptive importance sampling uses the posterior samples to build a suitable normalized proposal, so it benefits from localizing samples in regions of high posterior probability while preserving the properties of standard importance sampling*

*Chib’s method is a good alternative, that provide very good performances [but is not always available]**the success*[of nested sampling]*in the literature is surprising*.

## reXing the bridge

Posted in Books, pictures, Statistics with tags bridge sampling, Charlie Geyer, computational physics, Elsevier, logistic regression, multi-armed bandits, normalising constant, reverse logistic, Statistica Sinica, umbrella sampling on April 27, 2021 by xi'an

**A**s I was re-reading Xiao-Li Meng’s and Wing Hung Wong’s 1996 bridge sampling paper in Statistica Sinica, I realised they were making the link with Geyer’s (1994) mythical tech report, in the sense that the iterative construction of α functions “converges to the `reverse logistic regression’ described in Geyer (1994) for the two-density cases” (p.839). Although they also saw the later as an “iterative” application of Torrie and Valleau’s (1977) “umbrella sampling” estimator. And cited Bennett (1976) in the Journal of Computational Physics *[for which Elsevier still asks for $39.95!]* as the originator of the formula [check (6)]. And of the optimal solution (check (8)). Bennett (1976) also mentions that the method fares poorly when the targets do not overlap:

“When the two ensembles neither overlap nor satisfy the above smoothness condition, an accurate estimate of the free energy cannot be made without gathering additional MC data from one or more intermediateensembles”

in which case this sequence of intermediate targets could be constructed and, who knows?!, optimised. (This may be the chain solution discussed in the conclusion of the paper.) Another optimisation not considered in enough detail is the allocation of the computing time to the two densities, maybe using a bandit strategy to avoid estimating the variance of the importance weights first.

## self-healing umbrella sampling

Posted in Kids, pictures, Statistics, University life with tags acceleration of MCMC algorithms, adaptive MCMC methods, Monte Carlo experiment, multimodality, Tintin, umbrella sampling, Wang-Landau algorithm, well-tempered algorithm on November 5, 2014 by xi'an**T**en days ago, Gersende Fort, Benjamin Jourdain, Tony Lelièvre, and Gabriel Stoltz arXived a study about an adaptive umbrella sampler that can be re-interpreted as a Wang-Landau algorithm, if not the most efficient version of the latter. This reminded me very much of the workshop we had all together in Edinburgh last June. And even more of the focus of the molecular dynamics talks in this same ICMS workshop about accelerating the MCMC exploration of multimodal targets. The self-healing aspect of the sampler is to adapt to the multimodal structure thanks to a partition that defines a biased sampling scheme spending time in each set of the partition in a frequency proportional to weights. While the optimal weights are the weights of the sets against the target distribution (are they truly optimal?! I would have thought lifting low density regions, i.e., marshes, could improve the mixing of the chain for a given proposal), those are unknown and they need to be estimated by an adaptive scheme that makes staying in a given set the less desirable the more one has visited it. By increasing the inverse weight of a given set by a factor each time it is visited. Which sounds indeed like Wang-Landau. The plus side of the self-healing umbrella sampler is that it only depends on a scale γ (and on the partition). Besides converging to the right weights of course. The downside is that it does not reach the most efficient convergence, since the adaptivity weight decreases in 1/n rather than 1/√n.

Note that the paper contains a massive experimental side where the authors checked the impact of various parameters by Monte Carlo studies of estimators involving more than a billion iterations. Apparently repeated a large number of times.

The next step in adaptivity should be about the adaptive determination of the partition, hoping for a robustness against the dimension of the space. Which may be unreachable if I judge by the apparent deceleration of the method when the number of terms in the partition increases.