## Archive for Approximate Bayesian computation

## the true meaning of ABC

Posted in pictures, Running with tags ABC, Approximate Bayesian computation, Burgers and fries, France, jatp, Paris suburbs, restaurant, Villejuif on May 14, 2019 by xi'an## holistic framework for ABC

Posted in Books, Statistics, University life with tags ABC, AISTATS, Approximate Bayesian computation, Japan, likelihood-free methods, Okinawa, reproducing kernel Hilbert space, RKHS on April 19, 2019 by xi'an**A**n AISTATS 2019 paper was recently arXived by Kelvin Hsu and Fabio Ramos. Proposing an ABC method

“…consisting of (1) a consistent surrogate likelihood model that modularizes queries from simulation calls, (2) a Bayesian learning objective for hyperparameters that improves inference accuracy, and (3) a posterior surrogate density and a super-sampling inference algorithm using its closed-form posterior mean embedding.”

While this sales line sounds rather obscure to me, the authors further defend their approach against ABC-MCMC or synthetic likelihood by the points

“that (1) only one new simulation is required at each new parameter θ and (2) likelihood queries do not need to be at parameters where simulations are available.”

using a RKHS approach to approximate the likelihood or the distribution of the summary (statistic) given the parameter (value) *θ*. Based on the choice of a certain positive definite kernel. (As usual, I do not understand why RKHS would do better than another non-parametric approach, especially since the approach approximates the full likelihood, but I am not a non-parametrician…)

“The main advantage of using an approximate surrogate likelihood surrogate model is that it readily provides a marginal surrogate likelihood quantity that lends itself to a hyper-parameter learning algorithm”

The tolerance ε (and other cyberparameters) are estimated by maximising the approximated marginal likelihood, which happens to be available in the convenient case the prior is an anisotropic Gaussian distribution. For the simulated data in the reference table? But then missing the need for localising the simulations near the posterior? Inference is then conducting by simulating from this approximation. With the common (to RKHS) drawback that the approximation is “bounded and normalized but potentially non-positive”.

## prepaid ABC

Posted in Books, pictures, Statistics, University life with tags ABC, Approximate Bayesian computation, KU Leuven, Leuven, likelihood-free methods, machine learning, neural network, reproducible research, support vector machines, synthetic likelihood on January 16, 2019 by xi'an**M**erijn Mestdagha, Stijn Verdoncka, Kristof Meersa, Tim Loossensa, and Francis Tuerlinckx from the KU Leuven, some of whom I met during a visit to its Wallon counterpart Louvain-La-Neuve, proposed and arXived a new likelihood-free approach based on saving simulations on a large scale for future users. Future users interested in the *same* model. The *very same* model. This makes the proposal quite puzzling as I have no idea as to when situations with exactly the same experimental conditions, up to the sample size, repeat over and over again. Or even just repeat once. (Some particular settings may accommodate for different sample sizes and the same prepaid database, but others as in genetics clearly do not.) I am sufficiently puzzled to suspect I have missed the message of the paper.

“In various fields, statistical models of interest are analytically intractable. As a result, statistical inference is greatly hampered by computational constraint s. However, given a model, different users with different data are likely to perform similar computations. Computations done by one user are potentially useful for other users with different data sets. We propose a pooling of resources across researchers to capitalize on this. More specifically, we preemptively chart out the entire space of possible model outcomes in a prepaid database. Using advanced interpolation techniques, any individual estimation problem can now be solved on the spot. The prepaid method can easily accommodate different priors as well as constraints on the parameters. We created prepaid databases for three challenging models and demonstrate how they can be distributed through an online parameter estimation service. Our method outperforms state-of-the-art estimation techniques in both speed (with a 23,000 to 100,000-fold speed up) and accuracy, and is able to handle previously quasi inestimable models.”

I foresee potential difficulties with this proposal, like compelling all future users to rely on the same summary statistics, on the same prior distributions (the “representative amount of parameter values”), and requiring a massive storage capacity. Plus furthermore relying at its early stage on the most rudimentary form of an ABC algorithm (although not acknowledged as such), namely the rejection one. When reading the description in the paper, the proposed method indeed selects the parameters (simulated from a prior or a grid) that are producing pseudo-observations that are closest to the actual observations (or their summaries s). The subsample thus constructed is used to derive a (local) non-parametric or machine-learning predictor s=f(θ). From which a point estimator is deduced by minimising in θ a deviance d(s⁰,f(θ)).

The paper does not expand much on the theoretical justifications of the approach (including the appendix that covers a formal situation where the prepaid grid conveniently covers the observed statistics). And thus does not explain on which basis confidence intervals should offer nominal coverage for the prepaid method. Instead, the paper runs comparisons with Simon Wood’s (2010) synthetic likelihood maximisation (Ricker model with three parameters), the rejection ABC algorithm (species dispersion trait model with four parameters), while the Leaky Competing Accumulator (with four parameters as well) seemingly enjoys no alternative. Which is strange since the first step of the prepaid algorithm is an ABC step, but I am unfamiliar with this model. Unsurprisingly, in all these cases, given that the simulation has been done prior to the computing time for the prepaid method and not for either synthetic likelihood or ABC, the former enjoys a massive advantage from the start.

“The prepaid method can be used for a very large number of observations, contrary to the synthetic likelihood or ABC methods. The use of very large simulated data sets allows investigation of large-sample properties of the estimator”

To return to the general proposal and my major reservation or misunderstanding, for different experiments, the (true or pseudo-true) value of the parameter will not be the same, I presume, and hence the region of interest [or grid] will differ. While, again, the computational gain is *de facto* obvious [since the costly production of the reference table is not repeated], and, to repeat myself, makes the comparison with methods that do require a massive number of simulations from scratch massively in favour of the prepaid option, I do not see a convenient way of recycling these prepaid simulations for another setting, that is, when some experimental factors, sample size or collection, or even just the priors, do differ. Again, I may be missing the point, especially in a specific context like repeated psychological experiments.

While this may have some applications in reproducibility (but maybe not, if the goal is in fact to detect cherry-picking), I see very little use in repeating the same statistical model on different datasets. Even repeating observations will require additional nuisance parameters and possibly perturb the likelihood and/or posterior to large extents.

## a book and three chapters on ABC

Posted in Statistics with tags ABC, ABC model choice, ABCel, Approximate Bayesian computation, empirical likelihood, Handbook of Approximate Bayesian computation, handbook of mixture analysis, Handbooks of Modern Statistical Methods, likelihood-free methods, pygmies, Western Africa on January 9, 2019 by xi'an**I**n connection with our handbook on mixtures being published, here are three chapters I contributed to from the Handbook of ABC, edited by Scott Sisson, Yanan Fan, and Mark Beaumont:

6. Likelihood-free Model Choice, by J.-M. Marin, P. Pudlo, A. Estoup and C.P. Robert

12. Approximating the Likelihood in ABC, by C. C. Drovandi, C. Grazian, K. Mengersen and C.P. Robert

## approximate likelihood perspective on ABC

Posted in Books, Statistics, University life with tags ABC, Approximate Bayesian computation, approximate likelihood, curse of dimensionality, g-and-k distributions, Gibbs sampling, IMS, MCqMC 2018, mixed effect models, Potts model, Statistics Surveys, summary statistics, survey, tolerance, winference on December 20, 2018 by xi'an**G**eorge Karabatsos and Fabrizio Leisen have recently published in Statistics Surveys a fairly complete survey on ABC methods [which earlier arXival I had missed]. Listing within an extensive bibliography of 20 pages some twenty-plus earlier reviews on ABC (with further ones in applied domains)!

*“(…) any ABC method (algorithm) can be categorized as either (1) rejection-, (2) kernel-, and (3) coupled ABC; and (4) synthetic-, (5) empirical- and (6) bootstrap-likelihood methods; and can be **combined with classical MC or VI algorithms [and] all 22 reviews of ABC methods have covered rejection and kernel ABC methods, but only three covered synthetic likelihood, one reviewed the empirical likelihood, and none have reviewed coupled ABC and bootstrap likelihood methods.”*

The motivation for using approximate likelihood methods is provided by the examples of g-and-k distributions, although the likelihood can be efficiently derived by numerical means, as shown by Pierre Jacob‘s winference package, of mixed effect linear models, although a completion by the mixed effects themselves is available for Gibbs sampling as in Zeger and Karim (1991), and of the hidden Potts model, which we covered by pre-processing in our 2015 paper with Matt Moores, Chris Drovandi, Kerrie Mengersen. The paper produces a general representation of the approximate likelihood that covers the algorithms listed above as through the table below (where t(.) denotes the summary statistic):

The table looks a wee bit challenging simply because the review includes the synthetic likelihood approach of Wood (2010), which figured preeminently in the 2012 Read Paper discussion but opens the door to all kinds of approximations of the likelihood function, including variational Bayes and non-parametric versions. After a description of the above versions (including a rather ignored coupled version) and the special issue of ABC model choice, the authors expand on the difficulties with running ABC, from multiple tuning issues, to the genuine curse of dimensionality in the parameter (with unnecessary remarks on low-dimension sufficient statistics since they are almost surely inexistent in most realistic settings), to the mis-specified case (on which we are currently working with David Frazier and Judith Rousseau). To conclude, an worthwhile update on ABC and on the side a funny typo from the reference list!

Li, W. and Fearnhead, P. (2018, in press). On the asymptotic efficiency

of approximate Bayesian computation estimators.Biometrikanana-na.

## ABC intro for Astrophysics

Posted in Books, Kids, Mountains, R, Running, Statistics, University life with tags ABC, Approximate Bayesian computation, Autrans, Bayesian foundations, Bayesian methodology, Book, computational astrophysics, review, Statistics for Astrophysics, summer course, survey, Vercors on October 15, 2018 by xi'an**T**oday I received in the mail a copy of the short book published by edp sciences after the courses we gave last year at the astrophysics summer school, in Autrans. Which contains a quick introduction to ABC extracted from my notes (which I still hope to turn into a book!). As well as a longer coverage of Bayesian foundations and computations by David Stenning and David van Dyk.

## Implicit maximum likelihood estimates

Posted in Statistics with tags ABC, Approximate Bayesian computation, GANs, Hyvärinen score, Kullback-Leibler divergence, likelihood-free methods, maximum likelihood estimation, NIPS 2018, Peter Diggle, untractable normalizing constant, Wasserstein distance on October 9, 2018 by xi'an**A**n ‘Og’s reader pointed me to this paper by Li and Malik, which made it to arXiv after not making it to NIPS. While the NIPS reviews were not particularly informative and strongly discordant, the authors point out in the comments that they are available for the sake of promoting discussion. (As made clear in earlier posts, I am quite supportive of this attitude! *Disclaimer: I was not involved in an evaluation of this paper, neither for NIPS nor for another conference or journal!!*) Although the paper does not seem to mention ABC in the setting of implicit likelihoods and generative models, there is a reference to the early (1984) paper by Peter Diggle and Richard Gratton that is often seen as the ancestor of ABC methods. The authors point out numerous issues with solutions proposed for parameter estimation in such implicit models. For instance, for GANs, they signal that “minimizing the Jensen-Shannon divergence or the Wasserstein distance between the empirical data distribution and the model distribution does not necessarily minimize the same between the true data distribution and the model distribution.” (Not mentioning the particular difficulty with Bayesian GANs.) Their own solution is the implicit maximum likelihood estimator, which picks the value of the parameter θ bringing a simulated sample the closest to the observed sample. Closest in the sense of the Euclidean distance between both samples. Or between the minimum of several simulated samples and the observed sample. (The modelling seems to imply the availability of n>1 observed samples.) They advocate using a stochastic gradient descent approach for finding the optimal parameter θ which presupposes that the dependence between θ and the simulated samples is somewhat differentiable. (And this does not account for using a min, which would make differentiation close to impossible.) The paper then meanders in a lengthy discussion as to whether maximising the likelihood makes sense, with a rather naïve view on why using the empirical distribution in a Kullback-Leibler divergence does not make sense! What does not make sense is considering the finite sample approximation to the Kullback-Leibler divergence with the true distribution in my opinion.