Archive for ABC-SMC

likelihood-free nested sampling

Posted in Books, Statistics with tags , , , , , , on April 11, 2022 by xi'an

Last week, I came by chance across a paper by Jan Mikelson and Mustafa Khammash on a likelihood-free version of nested sampling (a popular keyword on the ‘Og!). Published in 2020 in PLoS Comput Biol. The setup is a parameterised and hidden state-space model, which allows for an approximation of the (observed) likelihood function L(θ|y) by means of a particle filter. An immediate issue with this proposal is that a novel  filter need be produced for a new value of the parameter θ, which makes it enormously expensive. It then gets more bizarre as the [Monte Carlo] distribution of the particle filter approximation ô(θ|y) is agglomerated with the original prior π(θ) as a joint “prior” [despite depending on the observed y] and a nested sampling is conducted with level sets of the form

ô(θ|y)>ε.

Actually, if the Monte Carlo error was null, that is, if the number of particles was infinite,

ô(θ|y)=L(θ|y)

implies that this is indeed the original nested sampler. Simulation from the restricted region is done by constructing an extra density estimator of the constrained distribution (in θ)…

“We have shown how using a Monte Carlo estimate over the livepoints not only results in an unbiased estimator of the Bayesian evidence Z, but also allows us to derive a formulation for a lower bound on the achievable variance in each iteration (…)”

As shown by the above the authors insist on the unbiasedness of the particle approximation, but since nested sampling is not producing an unbiased estimator of the evidence Z, the point is somewhat moot. (I am also rather surprised by the reported lack of computing time benefit in running ABC-SMC.)

Introduction to Sequential Monte Carlo [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , on June 8, 2021 by xi'an

[Warning: Due to many CoI, from Nicolas being a former PhD student of mine, to his being a current colleague at CREST, to Omiros being co-deputy-editor for Biometrika, this review will not be part of my CHANCE book reviews.]

My friends Nicolas Chopin and Omiros Papaspiliopoulos wrote in 2020 An Introduction to Sequential Monte Carlo (Springer) that took several years to achieve and which I find remarkably coherent in its unified presentation. Particles filters and more broadly sequential Monte Carlo have expended considerably in the last 25 years and I find it difficult to keep track of the main advances given the expansive and heterogeneous literature. The book is also quite careful in its mathematical treatment of the concepts and, while the Feynman-Kac formalism is somewhat scary, it provides a careful introduction to the sampling techniques relating to state-space models and to their asymptotic validation. As an introduction it does not go to the same depths as Pierre Del Moral’s 2004 book or our 2005 book (Cappé et al.). But it also proposes a unified treatment of the most recent developments, including SMC² and ABC-SMC. There is even a chapter on sequential quasi-Monte Carlo, naturally connected to Mathieu Gerber’s and Nicolas Chopin’s 2015 Read Paper. Another significant feature is the articulation of the practical part around a massive Python package called particles [what else?!]. While the book is intended as a textbook, and has been used as such at ENSAE and in other places, there are only a few exercises per chapter and they are not necessarily manageable (as Exercise 7.1, the unique exercise for the very short Chapter 7.) The style is highly pedagogical, take for instance Chapter 10 on the various particle filters, with a detailed and separate analysis of the input, algorithm, and output of each of these. Examples are only strategically used when comparing methods or illustrating convergence. While the MCMC chapter (Chapter 15) is surprisingly small, it is actually an introducing of the massive chapter on particle MCMC (and a teaser for an incoming Papaspiloulos, Roberts and Tweedie, a slow-cooking dish that has now been baking for quite a while!).

adaptive ABC tolerance

Posted in Books, Statistics, University life with tags , , , , , , , , , on June 2, 2020 by xi'an

“There are three common approaches for selecting the tolerance sequence (…) [they] can lead to inefficient sampling”

Umberto Simola, Jessi Cisewski-Kehe, Michael Gutmann and Jukka Corander recently arXived a paper entitled Adaptive Approximate Bayesian Computation Tolerance Selection. I appreciate that they start from our ABC-PMC paper, i.e., Beaumont et al. (2009) [although the representation that the ABC tolerances are fixed in advance is somewhat incorrect in that we used in our codes quantiles of the distances to set our tolerances.] This is also the approach advocated for the initialisation step by the current paper.  Although remaining a wee bit vague. Subsequent steps are based on the proximity between the resulting approximations to the ABC posteriors, more exactly with a quantile derived from the maximum of the ratio between two estimated successive ABC posteriors. Mimicking the Accept-Reject step if always one step too late.  The iteration stops when the ratio is almost one, possibly missing the target due to Monte Carlo variability. (Recall that the “optimal” tolerance is not zero for a finite sample size.)

“…the decrease in the acceptance rate is mitigated by the improvement in the proposed particles.”

A problem is that it depends on the form of the approximation and requires non-parametric hence imprecise steps. Maybe variational encoders could help. Interesting approach by Sugiyama et al. (2012), of which I knew nothing, the core idea being that the ratio of two densities is also the solution to minimising a distance between the numerator density and a variable function times the bottom density. However since only the maximum of the ratio is needed, a more focused approach could be devised. Rather than first approximating the ratio and second maximising the estimated ratio. Maybe the solution of Goffinet et al. (1992) on estimating an accept-reject constant could work.

A further comment is that the estimated density is not properly normalised, which lessens the Accept-Reject analogy since the optimum may well stand above one. And thus stop “too soon”. (Incidentally, the paper contains the mixture example of Sisson et al. (2007), for which our own graphs were strongly criticised during our Biometrika submission!)

adaptive copulas for ABC

Posted in Statistics with tags , , , , , , , , on March 20, 2019 by xi'an

A paper on ABC I read on my way back from Cambodia:  Yanzhi Chen and Michael Gutmann arXived an ABC [in Edinburgh] paper on learning the target via Gaussian copulas, to be presented at AISTATS this year (in Okinawa!). Linking post-processing (regression) ABC and sequential ABC. The drawback in the regression approach is that the correction often relies on an homogeneity assumption on the distribution of the noise or residual since this approach only applies a drift to the original simulated sample. Their method is based on two stages, a coarse-grained one where the posterior is approximated by ordinary linear regression ABC. And a fine-grained one, which uses the above coarse Gaussian version as a proposal and returns a Gaussian copula estimate of the posterior. This proposal is somewhat similar to the neural network approach of Papamakarios and Murray (2016). And to the Gaussian copula version of Li et al. (2017). The major difference being the presence of two stages. The new method is compared with other ABC proposals at a fixed simulation cost, which does not account for the construction costs, although they should be relatively negligible. To compare these ABC avatars, the authors use a symmetrised Kullback-Leibler divergence I had not met previously, requiring a massive numerical integration (although this is not an issue for the practical implementation of the method, which only calls for the construction of the neural network(s)). Note also that sequential ABC is only run for two iterations, and also that none of the importance sampling ABC versions of Fearnhead and Prangle (2012) and of Li and Fearnhead (2018) are considered, all versions relying on the same vector of summary statistics with a dimension much larger than the dimension of the parameter. Except in our MA(2) example, where regression does as well. I wonder at the impact of the dimension of the summary statistic on the performances of the neural network, i.e., whether or not it is able to manage the curse of dimensionality by ignoring all but essentially the data  statistics in the optimisation.

particular degeneracy in ABC model choice

Posted in Statistics with tags , , , , , , on February 22, 2019 by xi'an

In one of the presentations by the last cohort of OxWaSP students, the group decided to implement an ABC model choice strategy based on sequential ABC inspired from Toni et al.  (2008). and this made me reconsider this approach (disclaimer: no criticism of the students implied in the following!). Indeed, the outcome of the simulation led to the ultimate selection of a single model, exclusive of all other models, corresponding to a posterior probability of one in favour of this model. Which sounds like a drawback of the ABC-SMC model choice approach in this setting, namely that it is quite prone to degeneracy, much more than standard SMC, since once a model vanishes from the list, it can never reappear in the following iterations if I am reading the algorithm correctly. To avoid this degeneracy, one would need to keep a population of particles of a given size, for each model, towards using it as a pool for moves at following iterations… Which also means that running in parallel as many ABC-SMC filters as there are models would be equally or more efficient, a wee bit like parallel MCMC chains may prove more efficient than reversible jump for model comparison. (On the trivial side, the OxWaSP seminar on the same day was briefly interrupted by water leakage caused by Storm Eric and poor workmanship on the new building!)