## ABC for big data

Posted in Books, Statistics, University life with tags , , , , , , , on June 23, 2015 by xi'an

“The results in this paper suggest that ABC can scale to large data, at least for models with a xed number of parameters, under the assumption that the summary statistics obey a central limit theorem.”

In a week rich with arXiv submissions about MCMC and “big data”, like the Variational consensus Monte Carlo of Rabinovich et al., or scalable Bayesian inference via particle mirror descent by Dai et al., Wentao Li and Paul Fearnhead contributed an impressive paper entitled Behaviour of ABC for big data. However, a word of warning: the title is somewhat misleading in that the paper does not address the issue of big or tall data per se, e.g., the impossibility to handle the whole data at once and to reproduce it by simulation, but rather the asymptotics of ABC. The setting is not dissimilar to the earlier Fearnhead and Prangle (2012) Read Paper. The central theme of this theoretical paper [with 24 pages of proofs!] is to study the connection between the number N of Monte Carlo simulations and the tolerance value ε when the number of observations n goes to infinity. A main result in the paper is that the ABC posterior mean can have the same asymptotic distribution as the MLE when ε=o(n-1/4). This is however in opposition with of no direct use in practice as the second main result that the Monte Carlo variance is well-controlled only when ε=O(n-1/2).

Something I have (slight) trouble with is the construction of an importance sampling function of the fABC(s|θ)α when, obviously, this function cannot be used for simulation purposes. The authors point out this fact, but still build an argument about the optimal choice of α, namely away from 0 and 1, like ½. Actually, any value different from 0,1, is sensible, meaning that the range of acceptable importance functions is wide. Most interestingly (!), the paper constructs an iterative importance sampling ABC in a spirit similar to Beaumont et al. (2009) ABC-PMC. Even more interestingly, the ½ factor amounts to updating the scale of the proposal as twice the scale of the target, just as in PMC.

Another aspect of the analysis I do not catch is the reason for keeping the Monte Carlo sample size to a fixed value N, while setting a sequence of acceptance probabilities (or of tolerances) along iterations. This is a very surprising result in that the Monte Carlo error does remain under control and does not dominate the overall error!

“Whilst our theoretical results suggest that point estimates based on the ABC posterior have good properties, they do not suggest that the ABC posterior is a good approximation to the true posterior, nor that the ABC posterior will accurately quantify the uncertainty in estimates.”

Overall, this is clearly a paper worth reading for understanding the convergence issues related with ABC. With more theoretical support than the earlier Fearnhead and Prangle (2012). However, it does not provide guidance into the construction of a sequence of Monte Carlo samples nor does it discuss the selection of the summary statistic, which has obviously a major impact on the efficiency of the estimation. And to relate to the earlier warning, it does not cope with “big data” in that it reproduces the original simulation of the n sized sample.

## ABC and cosmology

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on May 4, 2015 by xi'an

Two papers appeared on arXiv in the past two days with the similar theme of applying ABC-PMC [one version of which we developed with Mark Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin in 2009] to cosmological problems. (As a further coincidence, I had just started refereeing yet another paper on ABC-PMC in another astronomy problem!) The first paper cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation by Ishida et al. [“et al” including Ewan Cameron] proposes a Python ABC-PMC sampler with applications to galaxy clusters catalogues. The paper is primarily a description of the cosmoabc package, including code snapshots. Earlier occurrences of ABC in cosmology are found for instance in this earlier workshop, as well as in Cameron and Pettitt earlier paper. The package offers a way to evaluate the impact of a specific distance, with a 2D-graph demonstrating that the minimum [if not the range] of the simulated distances increases with the parameters getting away from the best parameter values.

“We emphasis [sic] that the choice of the distance function is a crucial step in the design of the ABC algorithm and the reader must check its properties carefully before any ABC implementation is attempted.” E.E.O. Ishida et al.

The second [by one day] paper Approximate Bayesian computation for forward modelling in cosmology by Akeret et al. also proposes a Python ABC-PMC sampler, abcpmc. With fairly similar explanations: maybe both samplers should be compared on a reference dataset. While I first thought the description of the algorithm was rather close to our version, including the choice of the empirical covariance matrix with the factor 2, it appears it is adapted from a tutorial in the Journal of Mathematical Psychology by Turner and van Zandt. One out of many tutorials and surveys on the ABC method, of which I was unaware, but which summarises the pre-2012 developments rather nicely. Except for missing Paul Fearnhead’s and Dennis Prangle’s semi-automatic Read Paper. In the abcpmc paper, the update of the covariance matrix is the one proposed by Sarah Filippi and co-authors, which includes an extra bias term for faraway particles.

“For complex data, it can be difficult or computationally expensive to calculate the distance ρ(x; y) using all the information available in x and y.” Akeret et al.

In both papers, the role of the distance is stressed as being quite important. However, the cosmoabc paper uses an L1 distance [see (2) therein] in a toy example without normalising between mean and variance, while the abcpmc paper suggests using a Mahalanobis distance that turns the d-dimensional problem into a comparison of one-dimensional projections.

Posted in Statistics, University life with tags , , , , , , on November 9, 2011 by xi'an

Maxime Lenormand, Franck Jabot and Guillaume Deffuant have just posted on arXiv a paper about a refinement of the ABC-PMC algorithm we developed with Marc Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin. The authors state in their introduction that ABC-PMC

presents two shortcomings which are particularly problematic for costly to simulate complex models. First, the sequence of tolerance levels ε1,…,εT has to be provided to the ABC algorithm. In practice, this implies to do preliminary simulations of the model, a step which is computationally costly for complex models. Furthermore, a badly chosen sequence of tolerance levels may inflate the number of simulations required to reach a given precision as we will see below. A second shortcoming of the PMC-ABC algorithm is that it lacks a criterion to decide whether it has converged. The final tolerance level εT may be too large for the ABC approach to satisfactorily approximate the posterior distribution of the model. Inversely, a larger εT may be sufficient to obtain a good approximation of the posterior distribution, hence sparing a number of model simulations.

shortcomings which I thought were addressed by the ABC-SMC algorithm of Pierre Del Moral, Arnaud Doucet and Ajay Jasra [not referenced in the current paper], the similar algorithm of Chris Drovandi and Tony Pettitt, and our recent paper with  Jean-Michel Marin, Pierre Pudlo and Mohammed Sedki [presented at ABC in London, submitted to Statistics and Computing a few months ago, but alas not available on-line for unmentionable reasons linked to the growing dysfunctionality of one co-author…!]. It is correct that we did not address the choice of the εt‘s in the original paper, even though we already used an on-line selection as a quantile of the current sample of distances. In essence, given the fundamentally non-parametric nature of ABC, the tolerances εt should always be determined from the simulated samples, as regular bandwidths.

The paper essentially proposes the same scheme as in Del Moral et al., before applying it to the toy example of Sisson et al. (PNAS, 2007) and to a more complex job dynamic model in central France. Were I to referee this paper, I would thus suggest that the authors incorporate a comparison with both papers of Del Moral et al. and of Drovandi and Pettitt to highlight the methodological  novelty of their approach.

## likelihood-free parallel tempering

Posted in Statistics, University life with tags , , , , , , , , , on August 20, 2011 by xi'an

Meïli Baragatti, Agnès Grimaud, and Denys Pommeret posted an ABC paper on arXiv entitled Likelihood-free parallel tempering. While most ABC methods essentially are tempering methods, in that the tolerance level acts like an energy level, this paper uses parallel chains at various tolerance levels, with an exchange mechanism derived from Geyer and Thomson (1995, JASA). As with regular ABC-MCMC, the acceptance probability is such that the likelihood needs not be computed. On the mixture example of Sisson et al. (2007, PNAS) and on the tuberculosis example of Tanaka et al. (2006, Genetics), the authors report better performances than ABC-PMC, ABC-MCMC and ABC. (In a bimodal toy example,  ABC-PMC does not identify a second mode, which should not be the case with a large enough initial tolerance and a small enough tempering decrease step.) The paper introduces a sequence of temperatures in addition to a sequence of tolerances and it is only through an example that I understood the (unusual) role of the temperatures as scale factors in the  random walk proposal. It seems to me that an annealing step should be added as the chains with larger tolerances are less interesting as time goes on.

Ps-Scott Sisson just signaled on his twitter account the publication of several papers using ABC in monkey evolution. As well as a fourth paper by Wegman et al.. estimating the size of the initial American settlers to be around 100, about 13,000 years ago, all using standard ABC model choice techniques. Scott also pointed out a conference held in Bristol next April 16-19.