Archive for advanced Monte Carlo methods

afternoon on Bayesian computation

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , on April 6, 2016 by xi'an

Richard Everitt organises an afternoon workshop on Bayesian computation in Reading, UK, on April 19, the day before the Estimating Constant workshop in Warwick, following a successful afternoon last year. Here is the programme:

1230-1315  Antonietta Mira, Università della Svizzera italiana
1315-1345  Ingmar Schuster, Université Paris-Dauphine
1345-1415  Francois-Xavier Briol, University of Warwick
1415-1445  Jack Baker, University of Lancaster
1445-1515  Alexander Mihailov, University of Reading
1515-1545  Coffee break
1545-1630  Arnaud Doucet, University of Oxford
1630-1700  Philip Maybank, University of Reading
1700-1730  Elske van der Vaart, University of Reading
1730-1800  Reham Badawy, Aston University
1815-late  Pub and food (SCR, UoR campus)

and the general abstract:

The Bayesian approach to statistical inference has seen major successes in the past twenty years, finding application in many areas of science, engineering, finance and elsewhere. The main drivers of these successes were developments in Monte Carlo methods and the wide availability of desktop computers. More recently, the use of standard Monte Carlo methods has become infeasible due the size and complexity of data now available. This has been countered by the development of next-generation Monte Carlo techniques, which are the topic of this meeting.

The meeting takes place in the Nike Lecture Theatre, Agriculture Building [building number 59].

point process-based Monte Carlo

Posted in Books, Kids, Statistics, University life with tags , , , , , on December 3, 2015 by xi'an

Clément Walter from Paris just pointed me to an arXived paper he had very recently gotten accepted for publication in Statistics and Computing. (Congrats!) Because his paper relates to nested sampling. And connects it with rare event simulation via interacting particle systems. And multilevel Monte Carlo. I had missed it when it came out on arXiv last December [as the title was unrelated with nested sampling if not Monte Carlo], but the paper brings fairly interesting new results about an ideal version of nested sampling that is

  1. unbiased when using an infinite number of terms;
  2. always better than the standard Monte Carlo estimator, variance-wise;
  3. connected with an implicit marked Poisson process; and
  4. enjoying a finite variance provided the quantity of interest has an 1+ε moment.

Of course, such results only hold for an ideal version and do not address the issue of the conditional simulations required by nested sampling. (Which has an impact on the computing time as the conditional simulation becomes more and more expensive as the likelihood value increases.) The explanation therein of the approximation of tail probabilities by a Poisson estimate makes the link with deterministic nested sampling much clearer to me. Point 2 above means that the nested sampling estimate always does better than the average of the likelihood values produced by an iid or MCMC simulation from the prior distribution. The paper also borrows from the debiasing approach of Rhee and Glynn (already used by the Russian roulette) to turn truncated versions of the nested sampling estimator into an unbiased estimator, with a limited impact on the variance of the estimator. Truncation is associated with the generation of a geometric stopping time which parameter needs to be optimised. Without a more detailed reading, I am somewhat lost as to this optimisation remains feasible in complex settings… The paper contains an illustration for a Pareto distribution where optimisation and calibration can be conducted quite far. It also re-analyses the Mexican hat example of Skilling (2006), showing that our stopping rule may induce bias.

reading classics (The End)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on February 24, 2015 by xi'an

La Défense from Paris-Dauphine, Nov. 15, 2012Today was the final session of our Reading Classics Seminar for the academic year 2014-2015. I have not reported on this seminar much so far because it has had starting problems, namely hardly any student present on the first classes and therefore several re-starts until we reached a small group of interested students. And this is truly The End for this enjoyable experiment as this is the final year for my TSI Master at Paris-Dauphine, as it will become integrated within the new MASH Master next year.

As a last presentation for the entire series, my student picked John Skilling’s Nested Sampling, not that it was in my list of “classics”, but he had worked on the paper in a summer project and was thus reasonably fluent with the topic. As he did a good enough job (!), here are his slides.

Some of the questions that came to me during the talk were on how to run nested sampling sequentially, both in the data and in the number of simulated points, and on incorporating more deterministic moves in order to remove some of the Monte Carlo variability. I was about to ask about (!) the Hamiltonian version of nested sampling but then he mentioned his last summer internship on this very topic! I also realised during that talk that the formula (for positive random variables)

\int_0^\infty(1-F(x))\text{d}x = \mathbb{E}_F[X]

does not require absolute continuity of the distribution F.

controlled thermodynamic integral for Bayesian model comparison [reply]

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , , on April 30, 2014 by xi'an

Reykjavik1Chris Oates wrotes the following reply to my Icelandic comments on his paper with Theodore Papamarkou, and Mark Girolami, reply that is detailed enough to deserve a post on its own:

Thank you Christian for your discussion of our work on the Og, and also for your helpful thoughts in the early days of this project! It might be interesting to speculate on some aspects of this procedure:

(i) Quadrature error is present in all estimates of evidence that are based on thermodynamic integration. It remains unknown how to exactly compute the optimal (variance minimising) temperature ladder “on-the-fly”; indeed this may be impossible, since the optimum is defined via a boundary value problem rather than an initial value problem. Other proposals for approximating this optimum are compatible with control variates (e.g. Grosse et al, NIPS 2013, Friel and Wyse, 2014). In empirical experiments we have found that the second order quadrature rule proposed by Friel and Wyse 2014 leads to substantially reduced bias, regardless of the specific choice of ladder.

(ii) Our experiments considered first and second degree polynomials as ZV control variates. In fact, intuition specifically motivates the use of second degree polynomials: Let us presume a linear expansion of the log-likelihood in θ. Then the implied score function is constant, not depending on θ. The quadratic ZV control variates are, in effect, obtained by multiplying the score function by θ. Thus control variates can be chosen to perfectly correlate with the log-likelihood, leading to zero-variance estimators. Of course, there is an empirical question of whether higher-order polynomials are useful when this Taylor approximation is inappropriate, but they would require the estimation of many more coefficients and in practice may be less stable.

(iii) We require that the control variates are stored along the chain and that their sample covariance is computed after the MCMC has terminated. For the specific examples in the paper such additional computation is a negligible fraction of the total computational, so that we did not provide specific timings. When non-diffegeometric MCMC is used to obtain samples, or when the score is unavailable in closed-form and must be estimated, the computational cost of the procedure would necessarily increase.

For the wide class of statistical models with tractable likelihoods, employed in almost all areas of statistical application, the CTI we propose should provide state-of-the-art estimation performance with negligible increase in computational costs.

controlled thermodynamic integral for Bayesian model comparison

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , on April 24, 2014 by xi'an

Reykjavik1Chris Oates, Theodore Papamarkou, and Mark Girolami (all from the University of Warwick) just arXived a paper on a new form of thermodynamic integration for computing marginal likelihoods. (I had actually discussed this paper with the authors on a few occasions when visiting Warwick.) The other name of thermodynamic integration is path sampling (Gelman and Meng, 1998). In the current paper, the path goes from the prior to the posterior by a sequence of intermediary distributions using a power of the likelihood. While the path sampling technique is quite efficient a method, the authors propose to improve it through the recourse to control variates, in order to decrease the variance. The control variate is taken from Mira et al. (2013), namely a one-dimensional temperature-dependent transform of the score function. (Strictly speaking, this is an asymptotic control variate in that the mean is only asymptotically zero.) This control variate is then incorporated within the expectation inside the path sampling integral. Its arbitrary elements are then calibrated against the variance of the path sampling integral. Except for the temperature ladder where the authors use a standard geometric rate, as the approach does not account for Monte Carlo and quadrature errors. (The degree of the polynomials used in the control variates is also arbitrarily set.) Interestingly, the paper mixes a lot of recent advances, from the zero variance notion of Mira et al. (2013) to the manifold Metropolis-adjusted Langevin algorithm of Girolami and Calderhead (2011), uses as a base method pMCMC (Jasra et al., 2007). The examples processed in the paper are regression (where the controlled version truly has a zero variance!) and logistic regression (with the benchmarked Pima Indian dataset), with a counter-example of a PDE interestingly proposed in the discussion section. I quite agree with the authors that the method is difficult to envision in complex enough models. I also did not see mentions therein of the extra time involved in using this control variate idea.