Archive for SMC

scalable Langevin exact algorithm [armchair Read Paper]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on June 26, 2020 by xi'an

So, Murray Pollock, Paul Fearnhead, Adam M. Johansen and Gareth O. Roberts presented their Read Paper with discussions on the Wednesday aft! With a well-sized if virtual audience of nearly a hundred people. Here are a few notes scribbled during the Readings. And attempts at keeping the traditional structure of the meeting alive.

In their introduction, they gave the intuition of a quasi-stationary chain as the probability to be in A at time t while still alice as π(A) x exp(-λt) for a fixed killing rate λ. The concept is quite fascinating if less straightforward than stationarity! The presentation put the stress on the available recourse to an unbiased estimator of the κ rate whose initialisation scaled as O(n) but allowed a subsampling cost reduction afterwards. With a subsampling rat connected with Bayesian asymptotics, namely on how quickly the posterior concentrates. Unfortunately, this makes the practical construction harder, since n is finite and the concentration rate is unknown (although a default guess should be √n). I wondered if the link with self-avoiding random walks was more than historical.

The initialisation of the method remains a challenge in complex environments. And hence one may wonder if and how better it does when compared with SMC. Furthermore, while the motivation for using a Brownian motion stems from the practical side, this simulation does not account for the target π. This completely blind excursion sounds worse than simulating from the prior in other settings.

One early illustration for quasi stationarity was based on an hypothetical distribution of lions and wandering (Brownian) antelopes. I found that the associated concept of soft killing was not necessarily well received by …. the antelopes!

As it happens, my friend and coauthor Natesh Pillai was the first discussant! I did no not get the details of his first bimodal example. But he addressed my earlier question about how large the running time T should be. Since the computational cost should be exploding with T. He also drew a analogy with improper posteriors as to wonder about the availability of convergence assessment.

And my friend and coauthor Nicolas Chopin was the second discussant! Starting with a request to… leave the Pima Indians (model)  alone!! But also getting into a deeper assessment of the alternative use of SMCs.

Nested Sampling SMC [a reply]

Posted in Books, Statistics, University life with tags , , , , , , , , , on April 9, 2020 by xi'an
Here is a response from Robert Salomone following my comments of the earlier day (and pointing out I already commented the paper two years ago):
You may be interested to know that we are at the tail end of carrying out a major revision of the paper, which we hope will be done in the near future — there will be some new theory (we are in the final stages for a consistency proof of the ANS-SMC algorithm with new co-author Adam Johansen), as well as new numerics (including comparisons to Nested Sampling), and additional discussion that clarifies the overall narrative.
A few comments relating your post that may clear some things up:
  • The method you describe with the auxiliary variable is actually one of three proposed algorithms. We call this one “Improved Nested Sampling” as it is the algorithm most similar to the original Nested Sampling. Two further extensions are the adaptive SMC sampler, and the fixed SMC sampler – the latter of which is provably consistent and unbiased for the model evidence (we also often see improvements over standard NS for similar computational effort when MCMC is used).
  • Regarding computational effort – it is the same for Improved NS (in fact, you can obtain the standard Nested Sampling evidence estimate from the same computational run!). For the adaptive variant, the computational effort is roughly the same for ρ = e⁻¹. In the current version of the paper this is only discussed briefly (last page of p.23). However, in the revision we will include additional experiments comparing the practical performance.
  • Regarding the question of “why not regular SMC”; we chose to focus more on why SMC is a good way to do Nested Sampling rather than why Nested Sampling is a good way to do SMC. Our main priority was to show there is a lot of opportunity to develop new nested sampling style algorithms by approaching it from a different angle. That said, Nested Sampling’s primary advantage over standard SMC seems to be in problems involving “phase transitions’’ such as our first example, for which temperature based methods are inherently ill-suited (and will often fail to detect so!).

nested sampling via SMC

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on April 2, 2020 by xi'an

“We show that by implementing a special type of [sequential Monte Carlo] sampler that takes two im-portance sampling paths at each iteration, one obtains an analogous SMC method to [nested sampling] that resolves its main theoretical and practical issues.”

A paper by Queenslander Robert Salomone, Leah South, Chris Drovandi and Dirk Kroese that I had missed (and recovered by Grégoire after we discussed this possibility with our Master students). On using SMC in nested sampling. What are the difficulties mentioned in the above quote?

  1. Dependence between the simulated samples, since only the offending particle is moved by one or several MCMC steps. (And MultiNest is not a foolproof solution.)
  2. The error due to quadrature is hard to evaluate, with parallelised versions aggravating the error.
  3. There is a truncation error due to the stopping rule when the exact maximum of the likelihood function is unknown.

Not mentioning the Monte Carlo error, of course, which should remain at the √n level.

“Nested Sampling is a special type of adaptive SMC algorithm, where weights are assigned in a suboptimal way.”

The above remark is somewhat obvious for a fixed sequence of likelihood levels and a set of particles at each (ring) level. moved by a Markov kernel with the right stationary target. Constrained to move within the ring, which may prove delicate in complex settings. Such a non-adaptive version is however not realistic and hence both the level sets and the stopping rule need be selected from the existing simulation, respectively as a quantile of the observed likelihood and as a failure to modify the evidence approximation, an adaptation that is a Catch 22! as we already found in the AMIS paper.  (AMIS stands for adaptive mixture importance sampling.) To escape the quandary, the authors use both an auxiliary variable (to avoid atoms) and two importance sampling sequences (as in AMIS). And only a single particle with non-zero incremental weight for the (upper level) target. As the full details are a bit fuzzy to me, I hope I can experiment with my (quarantined) students on the full implementation of the method.

“Such cases asides, the question whether SMC is preferable using the TA or NS approach is really one of whether it is preferable to sample (relatively) easy distributions subject to a constraint or to sample potentially difficult distributions.”

A question (why not regular SMC?) I was indeed considering until coming to the conclusion section but did not find it treated in the paper. There is little discussion on the computing requirements either, as it seems the method is more time-consuming than a regular nested sample. (On the personal side,  I appreciated very much their “special thanks to Christian Robert, whose many blog posts on NS helped influence this work, and played a large partin inspiring it.”)

parallelizable sampling method for parameter inference of large biochemical reaction models

Posted in Books, Statistics with tags , , , , , , , , on June 18, 2018 by xi'an

I came across this older (2016) arXiv paper by Jan Mikelson and Mustafa Khammash [antidated as of April 25, 2018] as another version of nested sampling. The novelty of the approach is in applying nested sampling for approximating the likelihood function in the case of involved hidden Markov models (although the name itself does not appear in the paper). This is an interesting proposal, even though there is a fairly large and very active literature on computational approaches to such objects, from sequential Monte Carlo (SMC) to particle MCMC (pMCMC), to SMC².

“We found a way to efficiently sample parameter vectors (particles) from the super level set of the likelihood (sets of particles with a likelihood equal to or higher than some threshold) corresponding to an increasing sequence of thresholds” (p.2)

The approach here is an aggregate of nested sampling and particle filters (SMC), filters that are paradoxically employed in approximating the likelihood function itself, thus called repeatedly as the value of the parameter θ changes, unless I am confused, when it seems to me that, once started with particle filters, the authors could have used them all the way to the upper level (through, again, SMC²). Instead, and that brings a further degree of (uncorrected) approximation to the procedure, a Dirichlet process prior is used to estimate Gaussian mixture approximations to the true posterior distribution(s) on the (super) level sets. Now, approximating a distribution that is zero outside a compact set [the prior restricted to the likelihood being larger than by a distribution with an infinite support does not a priori sound like a particularly enticing idea. Note also that there is no later correction for using the mixture approximation to the restricted prior. (The method also involves an approximation of the (Lebesgue) volume of the level sets that may be poor in higher dimensions.)

“DP-GMM estimations work very well in high dimensional spaces and since we use rejection sampling to obtain samples from the level set by sampling from the DP-GMM estimation, the estimation error does not get propagated through iterations.” (p.13)

One aspect of the paper that puzzles me is the use of a rejection sampler to produce new parameters simulations from a given (super) level set, as this involves a lower bound M on the Gaussian mixture approximation over this level set. If a Gaussian mixture approximation is available, there is apparently no need for this as it can be sampled directly and values below the threshold can be disposed of. It is also unclear why the error does not propagate from one level to the next, if only because of the connection between the successive particle approximations.

 

controlled sequential Monte Carlo [BiPS seminar]

Posted in Statistics with tags , , , , , , , on June 5, 2018 by xi'an

The last BiPS seminar of the semester will be given by Jeremy Heng (Harvard) on Monday 11 June at 2pm, in room 3001, ENSAE, Paris-Saclay about his Controlled sequential Monte Carlo paper:

Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques to approximate high-dimensional probability distributions and their normalizing constants. They have found numerous applications in statistics and related fields as they can be applied to perform state estimation for non-linear non-Gaussian state space models and Bayesian inference for complex static models. Like many Monte Carlo sampling schemes, they rely on proposal distributions which have a crucial impact on their performance. We introduce here a class of controlled sequential Monte Carlo algorithms, where the proposal distributions are determined by approximating the solution to an associated optimal control problem using an iterative scheme. We provide theoretical analysis of our proposed methodology and demonstrate significant gains over state-of-the-art methods at a fixed computational complexity on a variety of applications.