Archive for sequential Monte Carlo

sandwiching a marginal

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 8, 2021 by xi'an

When working recently on a paper for estimating the marginal likelihood, I was pointed out this earlier 2015 paper by Roger Grosse, Zoubin Ghahramani and Ryan Adams, which had escaped till now. The beginning of the paper discusses the shortcomings of importance sampling (when simulating from the prior) and harmonic mean (when simulating from the posterior) as solution. And of anNealed importance sampling (when simulating from a sequence, which sequence?!, of targets). The authors are ending up proposing a sequential Monte Carlo or (posterior) particle learning solution. A remark on annealed importance sampling is that there exist both a forward and a backward version for estimating the marginal likelihood, either starting from a simulation from the prior (easy) or from a simulation from the posterior (hard!). As in, e.g., Nicolas Chopin’s thesis, the intermediate steps are constructed from a subsample of the entire sample.

In this context, unbiasedness can be misleading: because partition function estimates can vary over many orders of magnitude, it’s common for an unbiased estimator to drastically underestimate Ζ with overwhelming probability, yet occasionally return extremely large estimates. (An extreme example is likelihood weighting, which is unbiased, but is extremely unlikely to give an accurate answer for a high-dimensional model.) Unless the estimator is chosen very carefully, the variance is likely to be extremely large, or even infinite.”

One novel aspect of the paper is to advocate for the simultaneous use of different methods and for producing both lower and upper bounds on the marginal p(y) and wait for them to get close enough. It is however delicate to find upper bounds, except when using the dreaded harmonic mean estimator.  (A nice trick associated with reverse annealed importance sampling is that the reverse chain can be simulated exactly from the posterior if associated with simulated data, except I am rather lost at the connection between the actual and simulated data.) In a sequential harmonic mean version, the authors also look at the dangers of using an harmonic mean but argue the potential infinite variance of the weights does not matter so much for log p(y), without displaying any variance calculation… The paper also contains a substantial experimental section that compares the different solutions evoked so far, plus others like nested sampling. Which did not work poorly in the experiment (see below) but could not be trusted to provide a lower or an upper bound. The computing time to achieve some level of agreement is however rather daunting. An interesting read definitely (and I wonder what happened to the paper in the end).

the surprisingly overlooked efficiency of SMC

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on December 15, 2020 by xi'an

At the Laplace demon’s seminar today (whose cool name I cannot tire of!), Nicolas Chopin gave a webinar with the above equally cool title. And the first slide debunking myths about SMC’s:

The second part of the talk is about a recent arXival Nicolas wrote with his student Hai-Dang DauI missed, about increasing the number of MCMC steps when moving the particles. Called waste-free SMC. Where only one fraction of the particles is updated, but this is enough to create a sort of independence from previous iterations of the SMC. (Hai-Dang Dau and Nicolas Chopin had to taylor their own convergence proof for this modification of the usual SMC. Producing a single-run assessment of the asymptotic variance.)

On the side, I heard about a very neat (if possibly toyish) example on estimating the number of Latin squares:

And the other item of information is that Nicolas’ and Omiros’ book, An Introduction to Sequential Monte Carlo, has now appeared! (Looking forward reading the parts I had not yet read.)

online approximate Bayesian learning

Posted in Statistics with tags , , , , , , , on September 25, 2020 by xi'an

My friends and coauthors Matthieu Gerber and Randal Douc have just arXived a massive paper on online approximate Bayesian learning, namely the handling of the posterior distribution on the parameters of a state-space model, which remains a challenge to this day… Starting from the iterated batch importance sampling (IBIS) algorithm of Nicolas (Chopin, 2002) which he introduced in his PhD thesis. The online (“by online we mean that the memory and computational requirement to process each observation is finite and bounded uniformly in t”) method they construct is guaranteed for the approximate posterior to converge to the (pseudo-)true value of the parameter as the sample size grows to infinity, where the sequence of approximations is a Cesaro mixture of initial approximations with Gaussian or t priors, AMIS like. (I am somewhat uncertain about the notion of a sequence of priors used in this setup. Another funny feature is the necessity to consider a fat tail t prior from time to time in this sequence!) The sequence is in turn approximated by a particle filter. The computational cost of this IBIS is roughly in O(NT), depending on the regeneration rate.

nested sampling via SMC

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on April 2, 2020 by xi'an

“We show that by implementing a special type of [sequential Monte Carlo] sampler that takes two im-portance sampling paths at each iteration, one obtains an analogous SMC method to [nested sampling] that resolves its main theoretical and practical issues.”

A paper by Queenslander Robert Salomone, Leah South, Chris Drovandi and Dirk Kroese that I had missed (and recovered by Grégoire after we discussed this possibility with our Master students). On using SMC in nested sampling. What are the difficulties mentioned in the above quote?

  1. Dependence between the simulated samples, since only the offending particle is moved by one or several MCMC steps. (And MultiNest is not a foolproof solution.)
  2. The error due to quadrature is hard to evaluate, with parallelised versions aggravating the error.
  3. There is a truncation error due to the stopping rule when the exact maximum of the likelihood function is unknown.

Not mentioning the Monte Carlo error, of course, which should remain at the √n level.

“Nested Sampling is a special type of adaptive SMC algorithm, where weights are assigned in a suboptimal way.”

The above remark is somewhat obvious for a fixed sequence of likelihood levels and a set of particles at each (ring) level. moved by a Markov kernel with the right stationary target. Constrained to move within the ring, which may prove delicate in complex settings. Such a non-adaptive version is however not realistic and hence both the level sets and the stopping rule need be selected from the existing simulation, respectively as a quantile of the observed likelihood and as a failure to modify the evidence approximation, an adaptation that is a Catch 22! as we already found in the AMIS paper.  (AMIS stands for adaptive mixture importance sampling.) To escape the quandary, the authors use both an auxiliary variable (to avoid atoms) and two importance sampling sequences (as in AMIS). And only a single particle with non-zero incremental weight for the (upper level) target. As the full details are a bit fuzzy to me, I hope I can experiment with my (quarantined) students on the full implementation of the method.

“Such cases asides, the question whether SMC is preferable using the TA or NS approach is really one of whether it is preferable to sample (relatively) easy distributions subject to a constraint or to sample potentially difficult distributions.”

A question (why not regular SMC?) I was indeed considering until coming to the conclusion section but did not find it treated in the paper. There is little discussion on the computing requirements either, as it seems the method is more time-consuming than a regular nested sample. (On the personal side,  I appreciated very much their “special thanks to Christian Robert, whose many blog posts on NS helped influence this work, and played a large partin inspiring it.”)

ensemble rejection sampling

Posted in Statistics with tags , , , on March 25, 2020 by xi'an

George Deligiannidis, Arnaud Doucet and Sylvain Rubenthaler have constructed a form of Rao-Blackwellised estimate out of a regular rejection sampler. Doubly surprisingly as turning importance sampling into regular sampling plus  gaining over the standard accept-reject estimate. They call their approach ensemble rejection sampling. This is done by seeing the N-sample created from the proposal as an importance sampler, exploiting the importance weights towards estimating the (intractable) normalising constant of the target density, and creating an upper bound on this estimate Ẑ. That depends on the current value X from the N-sample under consideration for acceptance as

Z⁺=Ẑ+{max(w)-w(X)}/N

with a probability Ẑ/Z⁺ to accept X. The amazing result is that the X thus marginaly produced is distributed from the target! Meaning that this is a case for a self-normalised importance sampling distribution producing an exact simulation from the target. While this cannot produce an iid sample, it can be exploited to produce unbiased estimators of expectations under the target. Without even resampling and at a linear cost in the sample size N.

The method can be extended to the dynamic (state-space) case. At a cost of O(N²T) as first observed by Radford Neal. However, the importance sample seems to be distributed from a product of proposals that do not account for the previous particles. But maybe accounting for the observations. While the result involves upper bounds on the dynamic importance weights, the capacity to deliver exact simulations remains a major achievement, in my opinion.