Archive for particle MCMC

parallelizable sampling method for parameter inference of large biochemical reaction models

Posted in Books, Statistics with tags , , , , , , , , on June 18, 2018 by xi'an

I came across this older (2016) arXiv paper by Jan Mikelson and Mustafa Khammash [antidated as of April 25, 2018] as another version of nested sampling. The novelty of the approach is in applying nested sampling for approximating the likelihood function in the case of involved hidden Markov models (although the name itself does not appear in the paper). This is an interesting proposal, even though there is a fairly large and very active literature on computational approaches to such objects, from sequential Monte Carlo (SMC) to particle MCMC (pMCMC), to SMC².

“We found a way to efficiently sample parameter vectors (particles) from the super level set of the likelihood (sets of particles with a likelihood equal to or higher than some threshold) corresponding to an increasing sequence of thresholds” (p.2)

The approach here is an aggregate of nested sampling and particle filters (SMC), filters that are paradoxically employed in approximating the likelihood function itself, thus called repeatedly as the value of the parameter θ changes, unless I am confused, when it seems to me that, once started with particle filters, the authors could have used them all the way to the upper level (through, again, SMC²). Instead, and that brings a further degree of (uncorrected) approximation to the procedure, a Dirichlet process prior is used to estimate Gaussian mixture approximations to the true posterior distribution(s) on the (super) level sets. Now, approximating a distribution that is zero outside a compact set [the prior restricted to the likelihood being larger than by a distribution with an infinite support does not a priori sound like a particularly enticing idea. Note also that there is no later correction for using the mixture approximation to the restricted prior. (The method also involves an approximation of the (Lebesgue) volume of the level sets that may be poor in higher dimensions.)

“DP-GMM estimations work very well in high dimensional spaces and since we use rejection sampling to obtain samples from the level set by sampling from the DP-GMM estimation, the estimation error does not get propagated through iterations.” (p.13)

One aspect of the paper that puzzles me is the use of a rejection sampler to produce new parameters simulations from a given (super) level set, as this involves a lower bound M on the Gaussian mixture approximation over this level set. If a Gaussian mixture approximation is available, there is apparently no need for this as it can be sampled directly and values below the threshold can be disposed of. It is also unclear why the error does not propagate from one level to the next, if only because of the connection between the successive particle approximations.


Bayesian filtering and smoothing [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , on February 25, 2015 by xi'an

When in Warwick last October, I met Simo Särkkä, who told me he had published an IMS monograph on Bayesian filtering and smoothing the year before. I thought it would be an appropriate book to review for CHANCE and tried to get a copy from Oxford University Press, unsuccessfully. I thus bought my own book that I received two weeks ago and took the opportunity of my Czech vacations to read it… [A warning pre-empting accusations of self-plagiarism: this is a preliminary draft for a review to appear in CHANCE under my true name!]

“From the Bayesian estimation point of view both the states and the static parameters are unknown (random) parameters of the system.” (p.20)

 Bayesian filtering and smoothing is an introduction to the topic that essentially starts from ground zero. Chapter 1 motivates the use of filtering and smoothing through examples and highlights the naturally Bayesian approach to the problem(s). Two graphs illustrate the difference between filtering and smoothing by plotting for the same series of observations the successive confidence bands. The performances are obviously poorer with filtering but the fact that those intervals are point-wise rather than joint, i.e., that the graphs do not provide a confidence band. (The exercise section of that chapter is superfluous in that it suggests re-reading Kalman’s original paper and rephrases the Monty Hall paradox in a story unconnected with filtering!) Chapter 2 gives an introduction to Bayesian statistics in general, with a few pages on Bayesian computational methods. A first remark is that the above quote is both correct and mildly confusing in that the parameters can be consistently estimated, while the latent states cannot. A second remark is that justifying the MAP as associated with the 0-1 loss is incorrect in continuous settings.  The third chapter deals with the batch updating of the posterior distribution, i.e., that the posterior at time t is the prior at time t+1. With applications to state-space systems including the Kalman filter. The fourth to sixth chapters concentrate on this Kalman filter and its extension, and I find it somewhat unsatisfactory in that the collection of such filters is overwhelming for a neophyte. And no assessment of the estimation error when the model is misspecified appears at this stage. And, as usual, I find the unscented Kalman filter hard to fathom! The same feeling applies to the smoothing chapters, from Chapter 8 to Chapter 10. Which mimic the earlier ones. Continue reading

a week in Warwick

Posted in Books, Kids, Running, Statistics, University life with tags , , , , , , , , , , , , on October 19, 2014 by xi'an

Canadian geese, WarwickThis past week in Warwick has been quite enjoyable and profitable, from staying once again in a math house, to taking advantage of the new bike, to having several long discussions on several prospective and exciting projects, to meeting with some of the new postdocs and visitors, to attending Tony O’Hagan’s talk on “wrong models”. And then having Simo Särkkä who was visiting Warwick this week discussing his paper with me. And Chris Oates doing the same with his recent arXival with Mark Girolami and Nicolas Chopin (soon to be commented, of course!). And managing to run in dry conditions despite the heavy rains (but in pitch dark as sunrise is now quite late, with the help of a headlamp and the beauty of a countryside starry sky). I also evaluated several students’ projects, two of which led me to wonder when using RJMCMC was appropriate in comparing two models. In addition, I also eloped one evening to visit old (1977!) friends in Northern Birmingham, despite fairly dire London Midlands performances between Coventry and Birmingham New Street, the only redeeming feature being that the connecting train there was also late by one hour! (Not mentioning the weirdest taxi-driver ever on my way back, trying to get my opinion on whether or not he should have an affair… which at least kept me awake the whole trip!) Definitely looking forward my next trip there at the end of November.

Bayesian inference for low count time series models with intractable likelihoods

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , on January 21, 2014 by xi'an

sunset over the Brisbane river, Australia, Aug. 17, 2012Last evening, I read a nice paper with the above title by Drovandi, Pettitt and McCutchan, from QUT, Brisbane. Low count refers to observation with a small number of integer values. The idea is to mix ABC with the unbiased estimators of the likelihood proposed by Andrieu and Roberts (2009) and with particle MCMC… And even with a RJMCMC version. The special feature that makes the proposal work is that the low count features allows for a simulation of pseudo-observations (and auxiliary variables) that may sometimes authorise an exact constraint (that the simulated observation equals the true observation). And which otherwise borrows from Jasra et al. (2013) “alive particle” trick that turns a negative binomial draw into an unbiased estimation of the ABC target… The current paper helped me realise how powerful this trick is. (The original paper was arXived at a time I was off, so I completely missed it…) The examples studied in the paper may sound a wee bit formal, but they could lead to a better understanding of the method since alternatives could be available (?). Note that all those examples are not ABC per se in that the tolerance is always equal to zero.

The paper also includes reversible jump implementations. While it is interesting to see that ABC (in the authors’ sense) can be mixed with RJMCMC, it is delicate to get a feeling about the precision of the results, without a benchmark to compare to. I am also wondering about less costly alternatives like empirical likelihood and other ABC alternatives. Since Chris is visiting Warwick at the moment, I am sure we can discuss this issue next week there.

resampling and [GPU] parallelism

Posted in Statistics, University life with tags , , , , , , on March 13, 2012 by xi'an

In a recent note posted on arXiv, Lawrence Murray compares the implementation of resampling schemes for parallel systems like GPUs. Given a system of weighted particles, (xii), there are several ways of drawing a sample according to those weights:

  1. regular multinomial resampling, where each point in the (new) sample is one of the (xii), with probability (xii), meaning there is a uniform generated for each point;
  2. stratified resampling, where the weights are added, divided into equal pieces and a uniform is sampled on each piece, which means that points with large weights are sampled at least once and those with small weights at most once;
  3. systematic resampling, which is the same as the above except that the same uniform is used for each piece,
  4. Metropolis resampling, where a Markov chain converges to the distribution (ω1,…, ωP) on {1,…,P},

The three first resamplers are common in the particle system literature (incl. Nicolas Chopin’s PhD thesis), but difficult to adapt to GPUs (and I always feel uncomfortable with the fact that systematic uses a single uniform!), while the last one is more unusual, but actually well-fitted for a parallel implementation. While Lawrence Murray suggests using Raftery and Lewis’ (1992) assessment of the required number of Metropolis iterations to “achieve convergence”, I would instead suggest taking advantage of the toric nature of the space (as represented above) to run a random walk and wait for the equivalent of a complete cycle. In any case, this is a cool illustration of the new challenges posed by parallel implementations (like the development of proper random generators).

recent arXiv postings

Posted in Statistics, University life with tags , , , , , on October 17, 2011 by xi'an

Three interesting recent arXiv postings and not enough time to read them all and in the ‘Og bind them! (Of course, comments from readers welcome!)

Formulating a statistical inverse problem as one of inference in a Bayesian model has great appeal, notably for what this brings in terms of coherence, the interpretability of regularisation penalties, the integration of all uncertainties, and the principled way in which the set-up can be elaborated to encompass broader features of the context, such as measurement error, indirect observation, etc. The Bayesian formulation comes close to the way that most scientists intuitively regard the inferential task, and in principle allows the free use of subject knowledge in probabilistic model building. However, in some problems where the solution is not unique, for example in ill-posed inverse problems, it is important to understand the relationship between the chosen Bayesian model and the resulting solution. Taking emission tomography as a canonical example for study, we present results about consistency of the posterior distribution of the reconstruction, and a general method to study convergence of posterior distributions. To study efficiency of Bayesian inference for ill-posed linear inverse problems with constraint, we prove a version of the Bernstein-von Mises theorem for nonregular Bayesian models.

(Certainly unlikely to please the member of the audience in Zürich who questioned my Bayesian credentials for considering “true” models and consistency….)

Recently, Andrieu, Doucet and Holenstein (2010) introduced a general framework for using particle filters (PFs) to construct proposal kernels for Markov chain Monte Carlo (MCMC) methods. This framework, termed Particle Markov chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint Bayesian state and parameter inference in nonlinear/non-Gaussian state-space models. However, the mixing of the resulting MCMC kernels can be quite sensitive, both to the number of particles used in the underlying PF and to the number of observations in the data. In this paper we suggest alternatives to the three PMCMC methods introduced in Andrieu et al. (2010), which are much more robust to a low number of particles as well as a large number of observations. We consider some challenging inference problems and show in a simulation study that, for problems where existing PMCMC methods require around 1000 particles, the proposed methods provide satisfactory results with as few as 5 particles.

(I have not read the paper enough in-depth to be critical, however “hard” figures like 5, or 10³, are always suspicious in that they cannot carry to the general case…)

In this paper we present an algorithm for rapid Bayesian analysis that combines the benefits of nested sampling and artificial neural networks. The blind accelerated multimodal Bayesian inference (BAMBI) algorithm implements the MultiNest package for nested sampling as well as the training of an artificial neural network (NN) to learn the likelihood function. In the case of computationally expensive likelihoods, this allows the substitution of a much more rapid approximation in order to increase significantly the speed of the analysis. We begin by demonstrating, with a few toy examples, the ability of a NN to learn complicated likelihood surfaces. BAMBI’s ability to decrease running time for Bayesian inference is then demonstrated in the context of estimating cosmological parameters from WMAP and other observations. We show that valuable speed increases are achieved in addition to obtaining NNs trained on the likelihood functions for the different model and data combinations. These NNs can then be used for an even faster follow-up analysis using the same likelihood and different priors. This is a fully general algorithm that can be applied, without any pre-processing, to other problems with computationally expensive likelihood functions.

(This is primarily an astronomy paper that uses a sample produced by the nested sampling algorithm MultiNest to build a neural network instead of the model likelihood. The algorithm thus requires the likelihood to be available at some stage.)

workshop in Columbia

Posted in Statistics, Travel, University life with tags , , , , , , , , , on September 24, 2011 by xi'an

The workshop in Columbia University on Computational Methods in Applied Sciences is quite diverse in its topics.  Reminding me of the conference on Efficient Monte Carlo in Sandbjerg Estate, Sønderborg in 2008, celebrating the 70th birthday of Reuven Rubinstein, incl. some colleagues I had not met since this meeting. Yesterday I thus heard (quite interesting) talks on domains somehow far from my own, from Robert Adler on cohomology (giving a second look  at the thing after the talk I head in Wharton last year), to José Blanchet on simulation for infinite server queues (with a link to perfect sampling I could not exactly trace but that was certainly there). Several of the talks made me think of our Brownian motion confidence band paper, with Wilfrid Kendall and Jean-Michel Marin, esp. Gennady Samorodnitsky’s on the maximum of stochastic processes (and wonder whether we could have gone further in that direction). Pierre Del Moral presented a broad overview of the Feynman-Kacs’ approaches to particle methods, in particular particle MCMC, with application to some financial objects. Paul Glasserman talked about robust MCMC, which I found quite an appealing concept in that it included uncertainties about the model itself. And linked with minimax concepts. And Paul Dupuis exposed a parallel tempering method linked with large deviations, whose paper I am definitely looking forward. Now it is more than time to work on my own talk! (On a very personal basis, I sadly lost my sturdy Canon camera in the taxi from the airport! Will need a new one for the ‘Og!)