Archive for rare events

an extension of nested sampling

Posted in Books, Statistics, University life with tags , , , , , , , on December 16, 2014 by xi'an

I was reading [in the Paris métro] Hastings-Metropolis algorithm on Markov chains for small-probability estimation, arXived a few weeks ago by François Bachoc, Lionel Lenôtre, and Achref Bachouch, when I came upon their first algorithm that reminded me much of nested sampling: the following was proposed by Guyader et al. in 2011,

To approximate a tail probability P(H(X)>h),

  • start from an iid sample of size N from the reference distribution;
  • at each iteration m, select the point x with the smallest H(x)=ξ and replace it with a new point y simulated under the constraint H(y)≥ξ;
  • stop when all points in the sample are such that H(X)>h;
  • take


as the unbiased estimator of P(H(X)>h).

Hence, except for the stopping rule, this is the same implementation as nested sampling. Furthermore, Guyader et al. (2011) also take advantage of the bested sampling fact that, if direct simulation under the constraint H(y)≥ξ is infeasible, simulating via one single step of a Metropolis-Hastings algorithm is as valid as direct simulation. (I could not access the paper, but the reference list of Guyader et al. (2011) includes both original papers by John Skilling, so the connection must be made in the paper.) What I find most interesting in this algorithm is that it even achieves unbiasedness (even in the MCMC case!).

computational methods for statistical mechanics [day #3]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on June 6, 2014 by xi'an

Arthur Seat, Edinburgh, Sep. 7, 2011

The third day [morn] at our ICMS workshop was dedicated to path sampling. And rare events. Much more into [my taste] Monte Carlo territory. The first talk by Rosalind Allen looked at reweighting trajectories that are not in an equilibrium or are missing the Boltzmann [normalizing] constant. Although the derivation against a calibration parameter looked like the primary goal rather than the tool for constant estimation. Again papers in J. Chem. Phys.! And a potential link with ABC raised by Antonietta Mira… Then Jonathan Weare discussed stratification. With a nice trick of expressing the normalising constants of the different terms in the partition as solution(s) of a Markov system


Because the stochastic matrix M is easier (?) to approximate. Valleau’s and Torrie’s umbrella sampling was a constant reference in this morning of talks. Arnaud Guyader’s talk was in the continuation of Toni Lelièvre’s introduction, which helped a lot in my better understanding of the concepts. Rephrasing things in more statistical terms. Like the distinction between equilibrium and paths. Or bias being importance sampling. Frédéric Cérou actually gave a sort of second part to Arnaud’s talk, using importance splitting algorithms. Presenting an algorithm for simulating rare events that sounded like an opposite nested sampling, where the goal is to get down the target, rather than up. Pushing particles away from a current level of the target function with probability ½. Michela Ottobre completed the series with an entry into diffusion limits in the Roberts-Gelman-Gilks spirit when the Markov chain is not yet stationary. In the transient phase thus.

Split Sampling: expectations, normalisation and rare events

Posted in Books, Statistics, University life with tags , , , , , , on January 27, 2014 by xi'an

Just before Christmas (a year ago), John Birge, Changgee Chang, and Nick Polson arXived a paper with the above title. Split sampling is presented a a tool conceived to handle rare event probabilities, written in this paper as


where π is the prior and L the likelihood, m being a large enough bound to make the probability small. However, given John Skilling’s representation of the marginal likelihood as the integral of the Z(m)’s, this simulation technique also applies to the approximation of the evidence. The paper refers from the start to nested sampling as a motivation for this method, presumably not as a way to run nested sampling, which was created as a tool for evidence evaluation, but as a competitor. Nested sampling may indeed face difficulties in handling the coverage of the higher likelihood regions under the prior and it is an approximative method, as we detailed in our earlier paper with Nicolas Chopin. The difference between nested and split sampling is that split sampling adds a distribution ω(m) on the likelihood levels m. If pairs (x,m) can be efficiently generated by MCMC for the target


the marginal density of m can then be approximated by Rao-Blackwellisation. From which the authors derive an estimate of Z(m), since the marginal is actually proportional to ω(m)Z(m). (Because of the Rao-Blackwell argument, I wonder how much this differs from Chib’s 1995 method, i.e. if the split sampling estimator could be expressed as a special case of Chib’s estimator.) The resulting estimator of the marginal also requires a choice of ω(m) such that the associated cdf can be computed analytically. More generally, the choice of ω(m) impacts the quality of the approximation since it determines how often and easily high likelihood regions will be hit. Note also that the conditional π(x|m) is the same as in nested sampling, hence may run into difficulties for complex likelihoods or large datasets.

When reading the beginning of the paper, the remark that “the chain will visit each level roughly uniformly” (p.13) made me wonder at a possible correspondence with the Wang-Landau estimator. Until I read the reference to Jacob and Ryder (2012) on page 16. Once again, I wonder at a stronger link between both papers since the Wang-Landau approach aims at optimising the exploration of the simulation space towards a flat histogram. See for instance Figure 2.

The following part of the paper draws a comparison with both nested sampling and the product estimator of Fishman (1994). I do not fully understand the consequences of the equivalence between those estimators and the split sampling estimator for specific choices of the weight function ω(m). Indeed, it seemed to me that the main point was to draw from a joint density on (x,m) to avoid the difficulties of exploring separately each level set. And also avoiding the approximation issues of nested sampling. As a side remark, the fact that the harmonic mean estimator occurs at several points of the paper makes me worried. The qualification of “poor Monte Carlo error variances properties” is an understatement for the harmonic mean estimator, as it generally has infinite variance and it hence should not be used at all, even as a starting point. The paper does not elaborate much about the cross-entropy method, despite using an example from Rubinstein and Kroese (2004).

In conclusion, an interesting paper that made me think anew about the nested sampling approach, which keeps its fascination over the years! I will most likely use it to build an MSc thesis project this summer in Warwick.

Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on December 10, 2012 by xi'an

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting for TOMACS tonight in Berlin!) The issue is now ready for publication (next February unless I am confused!) and made of the following papers:

* Massive parallelization of serial inference algorithms for a complex generalized linear model
*Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm
* Efficient MCMC for Binomial Logit Models
* Adaptive Equi-Energy Sampler: Convergence and Illustration
* Particle algorithms for optimization on binary spaces
* Posterior expectation of regularly paved random histograms
* Small variance estimators for rare event probabilities
* Self-Avoiding Random Dynamics on Integer Complex Systems
* Bayesian learning of noisy Markov decision processes

Here is the draft of the editorial that will appear at the beginning of this special issue. (All faults are mine, of course!) Continue reading


Get every new post delivered to your Inbox.

Join 717 other followers