## adaptive resample move for estimating constants

Posted in Books, Statistics, University life with tags , , , , , on April 4, 2016 by xi'an

“…adaptive resample-move allows us to reduce the variance of the estimate of normalizing constants.”

A few days before our Estimating Constants workshop, Marco Fraccaroa, Ulrich Paquet, and Ole Winthera arXived a paper on estimating normalising constants by resample-move sequential Monte Carlo. The main result of this note is a theorem that derives the optimal relative size of each particle system, based on the approximate variance of the associated importance weights. Which is of major importance when running a sequential algorithm under computing time or budget constraints. In practice this theorem cannot be applied in a sequential manner [since it depends on future steps] and the authors propose instead an adaptive algorithm, enlarging the current set of particles if the effective sample size per particle is not large enough. There may however be a danger of an endless step if the proposal is particularly ill-fitted to the target. Under a fixed total budget, this potential explosion in the number of particles may jeopardize the entire process. Unless some safeguarding mechanism is introduced towards getting back in time to recover more variety in earlier steps. The paper compares the adaptive method with other solutions, including an Riemanian manifold HMC, on Gaussian processes and restricted Boltzman machines. Both examples being associated with very specialised ways of building the sequence of tempered targets, it seems.

## multilevel Monte Carlo for estimating constants

Posted in Books, Statistics, University life with tags , , , , , on March 18, 2016 by xi'an

Pierre Del Moral, Ajay Jasra, Kody Law, and Yan Zhou just arXived a paper entitled Sequential Monte Carlo samplers for normalizing constants. Which obviously attracted my interest! The context is one of a sequential Monte Carlo problem, with an associated sequence of targets and of attached normalising constants. While the quantity of interest only relates to the final distribution in the sequence, using Mike Giles‘ multilevel Monte Carlo approach allows for a more accurate estimation and recycling all the past particles, thanks to the telescoping formula. And the sequential representation also allows for an unbiased estimator, as is well known in the sequential Monte Carlo literature. The paper derives accurate bounds on both the variances of two normalisation constant estimators and the costs of producing such estimators (assuming there is an index typo in Corollary 3.1, where L-2 should be L-1). The improvement when compared with traditional SMC is clear on the example contained in the paper. As I read the paper rather quickly and without much attention to the notations, I may have missed the point, but I did not see any conclusion on the choice of the particle population size at each iteration of the SMC. After asking Ajay about it, he pointed out that this size can be derived as

$N_k=\epsilon^{-2}Lh_k^{(\beta+\zeta)/2}K_L$

(with notations taken from the paper).

## a simulated annealing approach to Bayesian inference

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on October 1, 2015 by xi'an

A misleading title if any! Carlos Albert arXived a paper with this title this morning and I rushed to read it. Because it sounded like Bayesian analysis could be expressed as a special form of simulated annealing. But it happens to be a rather technical sequel [“that complies with physics standards”] to another paper I had missed, A simulated annealing approach to ABC, by Carlos Albert, Hans Künsch, and Andreas Scheidegger. Paper that appeared in Statistics and Computing last year, and which is most interesting!

“These update steps are associated with a flow of entropy from the system (the ensemble of particles in the product space of parameters and outputs) to the environment. Part of this flow is due to the decrease of entropy in the system when it transforms from the prior to the posterior state and constitutes the well-invested part of computation. Since the process happens in finite time, inevitably, additional entropy is produced. This entropy production is used as a measure of the wasted computation and minimized, as previously suggested for adaptive simulated annealing” (p.3)

The notion behind this simulated annealing intrusion into the ABC world is that the choice of the tolerance can be adapted along iterations according to a simulated annealing schedule. Both papers make use of thermodynamics notions that are completely foreign to me, like endoreversibility, but aim at minimising the “entropy production of the system, which is a measure for the waste of computation”. The central innovation is to introduce an augmented target on (θ,x) that is

f(x|θ)π(θ)exp{-ρ(x,y)/ε},

where ε is the tolerance, while ρ(x,y) is a measure of distance to the actual observations, and to treat ε as an annealing temperature. In an ABC-MCMC implementation, the acceptance probability of a random walk proposal (θ’,x’) is then

exp{ρ(x,y)/ε-ρ(x’,y)/ε}∧1.

Under some regularity constraints, the sequence of targets converges to

π(θ|y)exp{-ρ(x,y)},

if ε decreases slowly enough to zero. While the representation of ABC-MCMC through kernels other than the Heaviside function can be found in the earlier ABC literature, the embedding of tolerance updating within the modern theory of simulated annealing is rather exciting.

Furthermore, we will present an adaptive schedule that attempts convergence to the correct posterior while minimizing the required simulations from the likelihood. Both the jump distribution in parameter space and the tolerance are adapted using mean fields of the ensemble.” (p.2)

What I cannot infer from a rather quick perusal of the papers is whether or not the implementation gets into the way of the all-inclusive theory. For instance, how can the Markov chain keep moving as the tolerance gets to zero? Even with a particle population and a sequential Monte Carlo implementation, it is unclear why the proposal scale factor [as in equation (34)] does not collapse to zero in order to ensure a non-zero acceptance rate. In the published paper, the authors used the same toy mixture example as ours [from Sisson et al., 2007], where we earned the award of the “incredibly ugly squalid picture”, with improvements in the effective sample size, but this remains a toy example. (Hopefully a post to be continued in more depth…)

## SMC 2015

Posted in Statistics, Travel, University life with tags , , , , , , , , , , on September 7, 2015 by xi'an

Nicolas Chopin ran a workshop at ENSAE on sequential Monte Carlo the past three days and it was a good opportunity to get a much needed up-to-date on the current trends in the field. Especially given that the meeting was literally downstairs from my office at CREST. And given the top range of researchers presenting their current or past work (in the very amphitheatre where I attended my first statistics lectures, a few dozen years ago!). Since unforeseen events made me miss most of the central day, I will not comment on individual talks, some of which I had already heard in the recent past, but this was a high quality workshop, topped by a superb organisation. (I started wondering why there was no a single female speaker in the program and so few female participants in the audience, then realised this is a field with a massive gender imbalance, which is difficult to explain given the different situation in Bayesian statistics and even in Bayesian computation…)  Some key topics I gathered during the talks I could attend–apologies to the other speakers for missing their talk due to those unforeseen events–are unbiasedness, which sounds central to the SMC methods [at least those presented there] as opposed to MCMC algorithms, and local features, used in different ways like hierarchical decomposition, multiscale, parallelisation, local coupling, &tc., to improve convergence and efficiency…

Posted in Books, pictures, Statistics, University life with tags , , , , , , on July 30, 2015 by xi'an

Ingmar Schuster, who visited Paris-Dauphine last Spring (and is soon to return here as a postdoc funded by Fondation des Sciences Mathématiques de Paris) has arXived last week a paper on gradient importance sampling. In this paper, he builds a sequential importance sampling (or population Monte Carlo) algorithm that exploits the additional information contained in the gradient of the target. The proposal or importance function being essentially the MALA move as its proposal, mixed across the elements of the previous population. When compared with our original PMC mixture of random walk proposals found in e.g. this paper, each term in the mixture thus involves an extra gradient, with a scale factor that decreases to zero as 1/t√t. Ingmar compares his proposal with an adaptive Metropolis, an adaptive MALTa and an HM algorithms, for two mixture distributions and the banana target of Haario et al. (1999) we also used in our paper. As well as a logistic regression. In each case, he finds both a smaller squared error and a smaller bias for the same computing time (evaluated as the number of likelihood evaluations). While we discussed this scheme when he visited, I remain intrigued as to why it works so well when compared with the other solutions. One possible explanation is that the use of the gradient drift is more efficient on a population of particles than on a single Markov chain, provided the population covers all modes of importance on the target surface: the “fatal” attraction of the local model is then much less of an issue…

## Sequential Monte Carlo 2015 workshop

Posted in pictures, R, Statistics, Travel, University life, Wines with tags , , , , , on January 22, 2015 by xi'an
An announcement for the SMC 2015 workshop:
Sequential Monte Carlo methods (also known as particle filters) have revolutionized the on-line and off-line analysis of data in fields as diverse as target tracking, computer vision, financial modelling, brain imagery, or population ecology. Their popularity stems from the fact that they have made possible to solve numerically many complex problems that were previously intractable.
The aim of the SMC 2015 workshop, in the spirit of SMC2006 and SMC2012, is to gather scientists from all areas of science interested in the theory, methodology or application of Sequential Monte Carlo methods.
SMC 2015 will take place at ENSAE, Paris, on August 26-28 2015.
The organising committee
Nicolas Chopin ENSAE, Paris
Thomas Schön, Uppsala University

## ABC by population annealing

Posted in Statistics, University life with tags , , , , , , , , on January 6, 2015 by xi'an

The paper “Bayesian Parameter Inference and Model Selection by Population Annealing in System Biology” by Yohei Murakami got published in PLoS One last August but I only became aware of it when ResearchGate pointed it out to me [by mentioning one of our ABC papers was quoted there].

“We are recommended to try a number of annealing schedules to check the influence of the schedules on the simulated data (…) As a whole, the simulations with the posterior parameter ensemble could, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference.”

Population annealing is a notion introduced by Y Iba, the very same IBA who introduced the notion of population Monte Carlo that we studied in subsequent papers. It reproduces the setting found in many particle filter papers of a sequence of (annealed or rather tempered) targets ranging from an easy (i.e., almost flat) target to the genuine target, and of an update of a particle set by MCMC moves and reweighing. I actually have trouble perceiving the difference with other sequential Monte Carlo schemes as those exposed in Del Moral, Doucet and Jasra (2006, Series B). And the same is true of the ABC extension covered in this paper. (Where the annealed intermediate targets correspond to larger tolerances.) This sounds like a traditional ABC-SMC algorithm. Without the adaptive scheme on the tolerance ε found e.g. in Del Moral et al., since the sequence is set in advance. [However, the discussion about the implementation includes the above quote that suggests a vague form of cross-validated tolerance construction]. The approximation of the marginal likelihood also sounds standard, the marginal being approximated by the proportion of accepted pseudo-samples. Or more exactly by the sum of the SMC weights at the end of the annealing simulation. This actually raises several questions: (a) this estimator is always between 0 and 1, while the marginal likelihood is not restricted [but this is due to a missing 1/ε in the likelihood estimate that cancels from both numerator and denominator]; (b) seeing the kernel as a non-parametric estimate of the likelihood led me to wonder why different ε could not be used in different models, in that the pseudo-data used for each model under comparison differs. If we were in a genuine non-parametric setting the bandwidth would be derived from the pseudo-data.

“Thus, Bayesian model selection by population annealing is valid.”

The discussion about the use of ABC population annealing somewhat misses the point of using ABC, which is to approximate the genuine posterior distribution, to wit the above quote: that the ABC Bayes factors favour the correct model in the simulation does not tell anything about the degree of approximation wrt the original Bayes factor. [The issue of non-consistent Bayes factors does not apply here as there is no summary statistic applied to the few observations in the data.] Further, the magnitude of the variability of the values of this Bayes factor as ε varies, from 1.3 to 9.6, mostly indicates that the numerical value is difficult to trust. (I also fail to explain the huge jump in Monte Carlo variability from 0.09 to 1.17 in Table 1.) That this form of ABC-SMC improves upon the basic ABC rejection approach is clear. However it needs to build some self-control to avoid arbitrary calibration steps and reduce the instability of the final estimates.

“The weighting function is set to be large value when the observed data and the simulated data are ‘‘close’’, small value when they are ‘‘distant’’, and constant when they are ‘‘equal’’.”

The above quote is somewhat surprising as the estimated likelihood f(xobs|xobs,θ) is naturally constant when xobs=xsim… I also failed to understand how the model intervened in the indicator function used as a default ABC kernel