## coupled filters

Posted in Kids, Statistics, University life with tags , , , , , , , , , on July 11, 2016 by xi'an Pierre Jacob, Fredrik Lindsten, and Thomas Schön recently arXived a paper on coupled particle filters. A coupling problem that proves to be much more complicated than expected, due to the discrete nature of particle filters. The starting point of the paper is the use of common (e.g., uniform) random numbers for the generation of each entry in the particle system at each time t, which maximal correlation gets damaged by the resampling steps (even when using the same uniforms). One suggestion for improving the correlation between entries at each time made in the paper is to resort to optimal transport, using the distance between particles as the criterion. A cheaper alternative is inspired from multi-level Monte Carlo. It builds a joint multinomial distribution by optimising the coupling probability. [Is there any way to iterate this construct instead of considering only the extreme cases of identical values versus independent values?] The authors also recall a “sorted sampling” method proposed by Mike Pitt in 2002, which is to rely on the empirical cdfs derived from the particle systems and on the inverse cdf technique, which is the approach I would have first considered. Possibly with a smooth transform of both ecdf’s in order to optimise the inverse cdf move.  Actually, I have trouble with the notion that the ancestors of a pair of particles should matter. Unless one envisions a correlation of the entire path, but I am ensure how one can make paths correlated (besides coupling). And how this impacts likelihood estimation. As shown in the above excerpt, the coupled approximations produce regular versions and, despite the negative bias, fairly accurate evaluations of likelihood ratios, which is all that matters in an MCMC implementation. The paper also proposes a smoothing algorithm based on Rhee and Glynn (2012) debiasing technique, which operates on expectations against the smoothing distribution (conditional on a value of the parameter θ). Which may connect with the notion of simulating correlated paths. The interesting part is that, due to the coupling, the Rhee and Glynn unbiased estimator has a finite (if random) stopping time.

## a simulated annealing approach to Bayesian inference

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on October 1, 2015 by xi'an A misleading title if any! Carlos Albert arXived a paper with this title this morning and I rushed to read it. Because it sounded like Bayesian analysis could be expressed as a special form of simulated annealing. But it happens to be a rather technical sequel [“that complies with physics standards”] to another paper I had missed, A simulated annealing approach to ABC, by Carlos Albert, Hans Künsch, and Andreas Scheidegger. Paper that appeared in Statistics and Computing last year, and which is most interesting!

“These update steps are associated with a flow of entropy from the system (the ensemble of particles in the product space of parameters and outputs) to the environment. Part of this flow is due to the decrease of entropy in the system when it transforms from the prior to the posterior state and constitutes the well-invested part of computation. Since the process happens in finite time, inevitably, additional entropy is produced. This entropy production is used as a measure of the wasted computation and minimized, as previously suggested for adaptive simulated annealing” (p.3)

The notion behind this simulated annealing intrusion into the ABC world is that the choice of the tolerance can be adapted along iterations according to a simulated annealing schedule. Both papers make use of thermodynamics notions that are completely foreign to me, like endoreversibility, but aim at minimising the “entropy production of the system, which is a measure for the waste of computation”. The central innovation is to introduce an augmented target on (θ,x) that is

f(x|θ)π(θ)exp{-ρ(x,y)/ε},

where ε is the tolerance, while ρ(x,y) is a measure of distance to the actual observations, and to treat ε as an annealing temperature. In an ABC-MCMC implementation, the acceptance probability of a random walk proposal (θ’,x’) is then

exp{ρ(x,y)/ε-ρ(x’,y)/ε}∧1.

Under some regularity constraints, the sequence of targets converges to

π(θ|y)exp{-ρ(x,y)},

if ε decreases slowly enough to zero. While the representation of ABC-MCMC through kernels other than the Heaviside function can be found in the earlier ABC literature, the embedding of tolerance updating within the modern theory of simulated annealing is rather exciting.

Furthermore, we will present an adaptive schedule that attempts convergence to the correct posterior while minimizing the required simulations from the likelihood. Both the jump distribution in parameter space and the tolerance are adapted using mean fields of the ensemble.” (p.2)

What I cannot infer from a rather quick perusal of the papers is whether or not the implementation gets into the way of the all-inclusive theory. For instance, how can the Markov chain keep moving as the tolerance gets to zero? Even with a particle population and a sequential Monte Carlo implementation, it is unclear why the proposal scale factor [as in equation (34)] does not collapse to zero in order to ensure a non-zero acceptance rate. In the published paper, the authors used the same toy mixture example as ours [from Sisson et al., 2007], where we earned the award of the “incredibly ugly squalid picture”, with improvements in the effective sample size, but this remains a toy example. (Hopefully a post to be continued in more depth…)

## ABC for big data

Posted in Books, Statistics, University life with tags , , , , , , , on June 23, 2015 by xi'an “The results in this paper suggest that ABC can scale to large data, at least for models with a xed number of parameters, under the assumption that the summary statistics obey a central limit theorem.”

In a week rich with arXiv submissions about MCMC and “big data”, like the Variational consensus Monte Carlo of Rabinovich et al., or scalable Bayesian inference via particle mirror descent by Dai et al., Wentao Li and Paul Fearnhead contributed an impressive paper entitled Behaviour of ABC for big data. However, a word of warning: the title is somewhat misleading in that the paper does not address the issue of big or tall data per se, e.g., the impossibility to handle the whole data at once and to reproduce it by simulation, but rather the asymptotics of ABC. The setting is not dissimilar to the earlier Fearnhead and Prangle (2012) Read Paper. The central theme of this theoretical paper [with 24 pages of proofs!] is to study the connection between the number N of Monte Carlo simulations and the tolerance value ε when the number of observations n goes to infinity. A main result in the paper is that the ABC posterior mean can have the same asymptotic distribution as the MLE when ε=o(n-1/4). This is however in opposition with of no direct use in practice as the second main result that the Monte Carlo variance is well-controlled only when ε=O(n-1/2).

Something I have (slight) trouble with is the construction of an importance sampling function of the fABC(s|θ)α when, obviously, this function cannot be used for simulation purposes. The authors point out this fact, but still build an argument about the optimal choice of α, namely away from 0 and 1, like ½. Actually, any value different from 0,1, is sensible, meaning that the range of acceptable importance functions is wide. Most interestingly (!), the paper constructs an iterative importance sampling ABC in a spirit similar to Beaumont et al. (2009) ABC-PMC. Even more interestingly, the ½ factor amounts to updating the scale of the proposal as twice the scale of the target, just as in PMC.

Another aspect of the analysis I do not catch is the reason for keeping the Monte Carlo sample size to a fixed value N, while setting a sequence of acceptance probabilities (or of tolerances) along iterations. This is a very surprising result in that the Monte Carlo error does remain under control and does not dominate the overall error!

“Whilst our theoretical results suggest that point estimates based on the ABC posterior have good properties, they do not suggest that the ABC posterior is a good approximation to the true posterior, nor that the ABC posterior will accurately quantify the uncertainty in estimates.”

Overall, this is clearly a paper worth reading for understanding the convergence issues related with ABC. With more theoretical support than the earlier Fearnhead and Prangle (2012). However, it does not provide guidance into the construction of a sequence of Monte Carlo samples nor does it discuss the selection of the summary statistic, which has obviously a major impact on the efficiency of the estimation. And to relate to the earlier warning, it does not cope with “big data” in that it reproduces the original simulation of the n sized sample.

## interacting particles ABC

Posted in Statistics with tags , , , , , , on August 27, 2012 by xi'an

Carlo Albert and Hans Kuensch recently posted an arXiv paper which provides a new perspective on ABC. It relates to ABC-MCMC and to ABC-SMC in different ways, but the major point is to propose a sequential schedule for decreasing the tolerance that ensures convergence. Although there exist other proofs of convergence in the literature, this one is quite novel in that it connects ABC with the cooling schedules of simulated annealing. (The fact that the sample size does not appear as in Fearnhead and Prangle and their non-parametric perspective can be deemed less practical, but I think this is simply another perspective on the problem!) The corresponding ABC algorithm is a mix of MCMC and SMC in that it lets a population of N particles evolve in a quasi-independent manner, the population being only used to update the parameters of the independent (normal) proposal and those of the cooling tolerance. Each particle in the population moves according to a Metropolis-Hastings step, but this is not an ABC-MCMC scheme in that the algorithm works with a population at all times, and this is not an ABC-SMC scheme in that there is no weighting and no resampling.

Maybe I can add two remarks about the conclusion: the authors do not seem aware of other works using other penalties than the 0-1 kernel, but those abound, see e.g. the discussion paper of Fearnhead and Prangle. Or Ratmann et al. The other missing connection is about adaptive tolerance construction, which is also found in the literature, see e.g. Doucet et al. or Drovandi and Pettitt.

## Wang, Landau, Markov, and others…

Posted in pictures, Statistics, University life with tags , , , , , , , , , on April 11, 2012 by xi'an On Thursday, the “Big’MC” seminar welcomes two talks (at 3pm and 4pm, resp., in IHP, Amphi Darboux):

• Orateur :Pierre Jacob (ENSAE) et Robin Ryder (CEREMADE)
• Titre : Some aspects of the Wang-Landau algorithm.
• Résumé : The Wang-Landau algorithm is an adaptive MCMC algorithm which generates a Markov chain designed to move efficiently in the state space, by constantly penalizing already-visited regions. It hence falls into the class of exploratory algorithms, especially when the chosen regions correspond to different levels of density values. We explore two novel aspects of the Wang-Landau algorithm. First, we show that the algorithm reaches the so-called Flat Histogram criterion in finite time, which ensures convergence properties. Second, we examine the effect of using multiple chains, interacting through a common component. That component essentially represents the history of already-visited regions, computed on all the chains. We show numerically the benefit of using parallel chains even if a single processing unit is available, in terms of stabilization of the schedule used in the adaptation process. If time permits, we shall present an ongoing attempt to study theoretically the effect of parallelization using Feynman-Kac semigroups.
• Références http://arxiv.org/abs/1110.4025 et http://arxiv.org/abs/1109.3829

and

• Orateur : Nick Whiteley ( Univ. Bristol, UK)
• Titre  : A particle method for approximating principal eigen-functions and related quantities
• Résumé : Perron-Frobenius theory treats the existence of a positive eigen-vector associated with the principal eigen-value \lambda_{\star} of a non-negative matrix, say Q. A simple method for approximating this eigen-vector involves computing the iterate \lambda_{\star}^{-n}Q^{(n)}, for large n. In the more general case that Q is a non-negative integral kernel, an extended Perron-Frobenius theory applies, but it is typical that neither the principal eigen-function nor the iterate \lambda_{\star}^{-n}Q^{(n)} can be computed exactly. In this setting we introduce an interacting particle algorithm which yields a numerical approximation of the principal eigen-function and the associated twisted Markov kernel. Some of its theoretical properties will be discussed and applications will be outlined. In particular, the algorithm allows approximation of an optimal importance sampling method for Markov chain rare event estimation.
Joint work with Nikolas Kantas.
• Référence : http://arxiv.org/abs/1202.6678