## Introduction to Sequential Monte Carlo [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , on June 8, 2021 by xi'an

[Warning: Due to many CoI, from Nicolas being a former PhD student of mine, to his being a current colleague at CREST, to Omiros being co-deputy-editor for Biometrika, this review will not be part of my CHANCE book reviews.]

My friends Nicolas Chopin and Omiros Papaspiliopoulos wrote in 2020 An Introduction to Sequential Monte Carlo (Springer) that took several years to achieve and which I find remarkably coherent in its unified presentation. Particles filters and more broadly sequential Monte Carlo have expended considerably in the last 25 years and I find it difficult to keep track of the main advances given the expansive and heterogeneous literature. The book is also quite careful in its mathematical treatment of the concepts and, while the Feynman-Kac formalism is somewhat scary, it provides a careful introduction to the sampling techniques relating to state-space models and to their asymptotic validation. As an introduction it does not go to the same depths as Pierre Del Moral’s 2004 book or our 2005 book (Cappé et al.). But it also proposes a unified treatment of the most recent developments, including SMC² and ABC-SMC. There is even a chapter on sequential quasi-Monte Carlo, naturally connected to Mathieu Gerber’s and Nicolas Chopin’s 2015 Read Paper. Another significant feature is the articulation of the practical part around a massive Python package called particles [what else?!]. While the book is intended as a textbook, and has been used as such at ENSAE and in other places, there are only a few exercises per chapter and they are not necessarily manageable (as Exercise 7.1, the unique exercise for the very short Chapter 7.) The style is highly pedagogical, take for instance Chapter 10 on the various particle filters, with a detailed and separate analysis of the input, algorithm, and output of each of these. Examples are only strategically used when comparing methods or illustrating convergence. While the MCMC chapter (Chapter 15) is surprisingly small, it is actually an introducing of the massive chapter on particle MCMC (and a teaser for an incoming Papaspiloulos, Roberts and Tweedie, a slow-cooking dish that has now been baking for quite a while!).

## congrats, Pierre!!!

Posted in Statistics with tags , , , , , , , , , on March 3, 2021 by xi'an

## One World webinars

Posted in Statistics with tags , , , , on April 21, 2020 by xi'an

Just a notice that our ABC World seminar has joined the “franchise” of the One World seminars

and that, on Thursday, 23 April, at 12:30 (CEST) IIvis Kerama and Richard Everitt will talk on  Rare event ABC-SMC², while, also on Thursday, at 16:00 (CEST) Michela Ottobre will talk Fast non mean-field network: uniform in time averaging in the One World Probability Seminar.

## parallelizable sampling method for parameter inference of large biochemical reaction models

Posted in Books, Statistics with tags , , , , , , , , on June 18, 2018 by xi'an

I came across this older (2016) arXiv paper by Jan Mikelson and Mustafa Khammash [antidated as of April 25, 2018] as another version of nested sampling. The novelty of the approach is in applying nested sampling for approximating the likelihood function in the case of involved hidden Markov models (although the name itself does not appear in the paper). This is an interesting proposal, even though there is a fairly large and very active literature on computational approaches to such objects, from sequential Monte Carlo (SMC) to particle MCMC (pMCMC), to SMC².

“We found a way to efficiently sample parameter vectors (particles) from the super level set of the likelihood (sets of particles with a likelihood equal to or higher than some threshold) corresponding to an increasing sequence of thresholds” (p.2)

The approach here is an aggregate of nested sampling and particle filters (SMC), filters that are paradoxically employed in approximating the likelihood function itself, thus called repeatedly as the value of the parameter θ changes, unless I am confused, when it seems to me that, once started with particle filters, the authors could have used them all the way to the upper level (through, again, SMC²). Instead, and that brings a further degree of (uncorrected) approximation to the procedure, a Dirichlet process prior is used to estimate Gaussian mixture approximations to the true posterior distribution(s) on the (super) level sets. Now, approximating a distribution that is zero outside a compact set [the prior restricted to the likelihood being larger than by a distribution with an infinite support does not a priori sound like a particularly enticing idea. Note also that there is no later correction for using the mixture approximation to the restricted prior. (The method also involves an approximation of the (Lebesgue) volume of the level sets that may be poor in higher dimensions.)

“DP-GMM estimations work very well in high dimensional spaces and since we use rejection sampling to obtain samples from the level set by sampling from the DP-GMM estimation, the estimation error does not get propagated through iterations.” (p.13)

One aspect of the paper that puzzles me is the use of a rejection sampler to produce new parameters simulations from a given (super) level set, as this involves a lower bound M on the Gaussian mixture approximation over this level set. If a Gaussian mixture approximation is available, there is apparently no need for this as it can be sampled directly and values below the threshold can be disposed of. It is also unclear why the error does not propagate from one level to the next, if only because of the connection between the successive particle approximations.

## the Hyvärinen score is back

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on November 21, 2017 by xi'an

Stéphane Shao, Pierre Jacob and co-authors from Harvard have just posted on arXiv a new paper on Bayesian model comparison using the Hyvärinen score

$\mathcal{H}(y, p) = 2\Delta_y \log p(y) + ||\nabla_y \log p(y)||^2$

which thus uses the Laplacian as a natural and normalisation-free penalisation for the score test. (Score that I first met in Padova, a few weeks before moving from X to IX.) Which brings a decision-theoretic alternative to the Bayes factor and which delivers a coherent answer when using improper priors. Thus a very appealing proposal in my (biased) opinion! The paper is mostly computational in that it proposes SMC and SMC² solutions to handle the estimation of the Hyvärinen score for models with tractable likelihoods and tractable completed likelihoods, respectively. (Reminding me that Pierre worked on SMC² algorithms quite early during his Ph.D. thesis.)

A most interesting remark in the paper is to recall that the Hyvärinen score associated with a generic model on a series must be the prequential (predictive) version

$\mathcal{H}_T (M) = \sum_{t=1}^T \mathcal{H}(y_t; p_M(dy_t|y_{1:(t-1)}))$

rather than the version on the joint marginal density of the whole series. (Followed by a remark within the remark that the logarithm scoring rule does not make for this distinction. And I had to write down the cascading representation

$\log p(y_{1:T})=\sum_{t=1}^T \log p(y_t|y_{1:t-1})$

to convince myself that this unnatural decomposition, where the posterior on θ varies on each terms, is true!) For consistency reasons.

This prequential decomposition is however a plus in terms of computation when resorting to sequential Monte Carlo. Since each time step produces an evaluation of the associated marginal. In the case of state space models, another decomposition of the authors, based on measurement densities and partial conditional expectations of the latent states allows for another (SMC²) approximation. The paper also establishes that for non-nested models, the Hyvärinen score as a model selection tool asymptotically selects the closest model to the data generating process. For the divergence induced by the score. Even for state-space models, under some technical assumptions.  From this asymptotic perspective, the paper exhibits an example where the Bayes factor and the Hyvärinen factor disagree, even asymptotically in the number of observations, about which mis-specified model to select. And last but not least the authors propose and assess a discrete alternative relying on finite differences instead of derivatives. Which remains a proper scoring rule.

I am quite excited by this work (call me biased!) and I hope it can induce following works as a viable alternative to Bayes factors, if only for being more robust to the [unspecified] impact of the prior tails. As in the above picture where some realisations of the SMC² output and of the sequential decision process see the wrong model being almost acceptable for quite a long while…