Archive for auxiliary particle filter

selecting summary statistics [a tale of two distances]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on May 23, 2019 by xi'an

As Jonathan Harrison came to give a seminar in Warwick [which I could not attend], it made me aware of his paper with Ruth Baker on the selection of summaries in ABC. The setting is an ABC-SMC algorithm and it relates with Fearnhead and Prangle (2012), Barnes et al. (2012), our own random forest approach, the neural network version of Papamakarios and Murray (2016), and others. The notion here is to seek the optimal weights of different summary statistics in the tolerance distance, towards a maximization of a distance (Hellinger) between prior and ABC posterior (Wasserstein also comes to mind!). A sort of dual of the least informative prior. Estimated by a k-nearest neighbour version [based on samples from the prior and from the ABC posterior] I had never seen before. I first did not get how this k-nearest neighbour distance could be optimised in the weights since the posterior sample was already generated and (SMC) weighted, but the ABC sample can be modified by changing the [tolerance] distance weights and the resulting Hellinger distance optimised this way. (There are two distances involved, in case the above description is too murky!)

“We successfully obtain an informative unbiased posterior.”

The paper spends a significant while in demonstrating that the k-nearest neighbour estimator converges and much less on the optimisation procedure itself, which seems like a real challenge to me when facing a large number of particles and a high enough dimension (in the number of statistics). (In the examples, the size of the summary is 1 (where does the weight matter?), 32, 96, 64, with 5 10⁴, 5 10⁴, 5 10³ and…10 particles, respectively.) The authors address the issue, though, albeit briefly, by mentioning that, for the same overall computation time, the adaptive weight ABC is indeed further from the prior than a regular ABC with uniform weights [rather than weighted by the precisions]. They also argue that down-weighting some components is akin to selecting a subset of summaries, but I beg to disagree with this statement as the weights are never exactly zero, as far as I can see, hence failing to fight the curse of dimensionality. Some LASSO version could implement this feature.

likelihood free nested sampling

Posted in Books, Statistics with tags , , , , , , , , , , , on April 26, 2019 by xi'an

A recent paper by Mikelson and Khammash found on bioRxiv considers the (paradoxical?) mixture of nested sampling and intractable likelihood. They however cover only the case when a particle filter or another unbiased estimator of the likelihood function can be found. Unless I am missing something in the paper, this seems a very costly and convoluted approach when pseudo-marginal MCMC is available. Or the rather substantial literature on computational approaches to state-space models. Furthermore simulating under the lower likelihood constraint gets even more intricate than for standard nested sampling as the parameter space is augmented with the likelihood estimator as an extra variable. And this makes a constrained simulation the harder, to the point that the paper need resort to a Dirichlet process Gaussian mixture approximation of the constrained density. It thus sounds quite an intricate approach to the problem. (For one of the realistic examples, the authors mention a 12 hour computation on a 48 core cluster. Producing an approximation of the evidence that is not unarguably stabilised, contrary to the above.) Once again, not being completely up-to-date in sequential Monte Carlo, I may miss a difficulty in analysing such models with other methods, but the proposal seems to be highly demanding with respect to the target.

from least squares to signal processing and particle filtering

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on June 6, 2017 by xi'an

Nozer Singpurwalla, Nick. Polson, and Refik Soyer have just arXived a remarkable survey on the history of signal processing, from Gauß, Yule, Kolmogorov and Wiener, to Ragazzini, Shanon, Kálmán [who, I was surprised to learn, died in Gainesville last year!], Gibbs sampling, and the particle filters of the 1990’s.

approximate maximum likelihood estimation using data-cloning ABC

Posted in Books, Statistics, University life with tags , , , , , , , , on June 2, 2015 by xi'an

“By accepting of having obtained a poor approximation to the posterior, except for the location of its main mode, we switch to maximum likelihood estimation.”

Presumably the first paper ever quoting from the ‘Og! Indeed, Umberto Picchini arXived a paper about a technique merging ABC with prior feedback (rechristened data cloning by S. Lele), where a maximum likelihood estimate is produced by an ABC-MCMC algorithm. For state-space models. This relates to an earlier paper by Fabio Rubio and Adam Johansen (Warwick), who also suggested using ABC to approximate the maximum likelihood estimate. Here, the idea is to use an increasing number of replicates of the latent variables, as in our SAME algorithm, to spike the posterior around the maximum of the (observed) likelihood. An ABC version of this posterior returns a mean value as an approximate maximum likelihood estimate.

“This is a so-called “likelihood-free” approach [Sisson and Fan, 2011], meaning that knowledge of the complete expression for the likelihood function is not required.”

The above remark is sort of inappropriate in that it applies to a non-ABC setting where the latent variables are simulated from the exact marginal distributions, that is, unconditional on the data, and hence their density cancels in the Metropolis-Hastings ratio. This pre-dates ABC by a few years, since this was an early version of particle filter.

“In this work we are explicitly avoiding the most typical usage of ABC, where the posterior is conditional on summary statistics of data S(y), rather than y.”

Another point I find rather negative in that, for state-space models, using the entire time-series as a “summary statistic” is unlikely to produce a good approximation.

The discussion on the respective choices of the ABC tolerance δ and on the prior feedback number of copies K is quite interesting, in that Umberto Picchini suggests setting δ first before increasing the number of copies. However, since the posterior gets more and more peaked as K increases, the consequences on the acceptance rate of the related ABC algorithm are unclear. Another interesting feature is that the underlying MCMC proposal on the parameter θ is an independent proposal, tuned during the warm-up stage of the algorithm. Since the tuning is repeated at each temperature, there are some loose ends as to whether or not it is a genuine Markov chain method. The same question arises when considering that additional past replicas need to be simulated when K increases. (Although they can be considered as virtual components of a vector made of an infinite number of replicas, to be used when needed.)

The simulation study involves a regular regression with 101 observations, a stochastic Gompertz model studied by Sophie Donnet, Jean-Louis Foulley, and Adeline Samson in 2010. With 12 points. And a simple Markov model. Again with 12 points. While the ABC-DC solutions are close enough to the true MLEs whenever available, a comparison with the cheaper ABC Bayes estimates would have been of interest as well.

Stochastic volatility filtering with intractable likelihoods

Posted in Books, Statistics, University life with tags , , , , , , on May 23, 2014 by xi'an

“The contribution of our work is two-fold: first, we extend the SVM literature, by proposing a new method for obtaining the filtered volatility estimates. Second, we build upon the current ABC literature by introducing the ABC auxiliary particle filter, which can be easily applied not only to SVM, but to any hidden Markov model.”

Another ABC arXival: Emilian Vankov and Katherine B. Ensor posted a paper with the above title. They consider a stochastic volatility model with an α-stable distribution on the observables (or returns). Which makes the likelihood unavailable, even were the hidden Markov sequence known… Now, I find very surprising that the authors do not mention the highly relevant paper of Peters, Sisson and Fan, Likelihood-free Bayesian inference for α-stable models, published in CSDA, in 2012, where an ABC algorithm is specifically designed for handling α-stable likelihoods. (Commented on that earlier post.) Similarly, the use of a particle filter coupled to ABC seems to be advanced as a novelty when many researchers have implemented such filters, including Pierre Del Moral, Arnaud Doucet, Ajay Jasra, Sumeet Singh and others, in similar or more general settings. Furthermore, Simon Barthelmé and Nicolas Chopin analysed this very model by EP-ABC and ABC.  I thus find it a wee bit hard to pinpoint the degree of innovation contained in this new ABC paper