Archive for ABC

holistic framework for ABC

Posted in Books, Statistics, University life with tags , , , , , , , on April 19, 2019 by xi'an

An AISTATS 2019 paper was recently arXived by Kelvin Hsu and Fabio Ramos. Proposing an ABC method

“…consisting of (1) a consistent surrogate likelihood model that modularizes queries from simulation calls, (2) a Bayesian learning objective for hyperparameters that improves inference accuracy, and (3) a posterior surrogate density and a super-sampling inference algorithm using its closed-form posterior mean embedding.”

While this sales line sounds rather obscure to me, the authors further defend their approach against ABC-MCMC or synthetic likelihood by the points

“that (1) only one new simulation is required at each new parameter θ and (2) likelihood queries do not need to be at parameters where simulations are available.”

using a RKHS approach to approximate the likelihood or the distribution of the summary (statistic) given the parameter (value) θ. Based on the choice of a certain positive definite kernel. (As usual, I do not understand why RKHS would do better than another non-parametric approach, especially since the approach approximates the full likelihood, but I am not a non-parametrician…)

“The main advantage of using an approximate surrogate likelihood surrogate model is that it readily provides a marginal surrogate likelihood quantity that lends itself to a hyper-parameter learning algorithm”

The tolerance ε (and other cyberparameters) are estimated by maximising the approximated marginal likelihood, which happens to be available in the convenient case the prior is an anisotropic Gaussian distribution. For the simulated data in the reference table? But then missing the need for localising the simulations near the posterior? Inference is then conducting by simulating from this approximation. With the common (to RKHS) drawback that the approximation is “bounded and normalized but potentially non-positive”.

adaptive copulas for ABC

Posted in Statistics with tags , , , , , , , , on March 20, 2019 by xi'an

A paper on ABC I read on my way back from Cambodia:  Yanzhi Chen and Michael Gutmann arXived an ABC [in Edinburgh] paper on learning the target via Gaussian copulas, to be presented at AISTATS this year (in Okinawa!). Linking post-processing (regression) ABC and sequential ABC. The drawback in the regression approach is that the correction often relies on an homogeneity assumption on the distribution of the noise or residual since this approach only applies a drift to the original simulated sample. Their method is based on two stages, a coarse-grained one where the posterior is approximated by ordinary linear regression ABC. And a fine-grained one, which uses the above coarse Gaussian version as a proposal and returns a Gaussian copula estimate of the posterior. This proposal is somewhat similar to the neural network approach of Papamakarios and Murray (2016). And to the Gaussian copula version of Li et al. (2017). The major difference being the presence of two stages. The new method is compared with other ABC proposals at a fixed simulation cost, which does not account for the construction costs, although they should be relatively negligible. To compare these ABC avatars, the authors use a symmetrised Kullback-Leibler divergence I had not met previously, requiring a massive numerical integration (although this is not an issue for the practical implementation of the method, which only calls for the construction of the neural network(s)). Note also that sequential ABC is only run for two iterations, and also that none of the importance sampling ABC versions of Fearnhead and Prangle (2012) and of Li and Fearnhead (2018) are considered, all versions relying on the same vector of summary statistics with a dimension much larger than the dimension of the parameter. Except in our MA(2) example, where regression does as well. I wonder at the impact of the dimension of the summary statistic on the performances of the neural network, i.e., whether or not it is able to manage the curse of dimensionality by ignoring all but essentially the data  statistics in the optimisation.

asymptotics of synthetic likelihood [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on March 19, 2019 by xi'an

[Here is a reply from David, Chris, and Robert on my earlier comments, highlighting some points I had missed or misunderstood.]

Dear Christian

Thanks for your interest in our synthetic likelihood paper and the thoughtful comments you wrote about it on your blog.  We’d like to respond to the comments to avoid some misconceptions.

Your first claim is that we don’t account for the differing number of simulation draws required for each parameter proposal in ABC and synthetic likelihood.  This doesn’t seem correct, see the discussion below Lemma 4 at the bottom of page 12.  The comparison between methods is on the basis of effective sample size per model simulation.

As you say, in the comparison of ABC and synthetic likelihood, we consider the ABC tolerance \epsilon and the number of simulations per likelihood estimate M in synthetic likelihood as functions of n.  Then for tuning parameter choices that result in the same uncertainty quantification asymptotically (and the same asymptotically as the true posterior given the summary statistic) we can look at the effective sample size per model simulation.  Your objection here seems to be that even though uncertainty quantification is similar for large n, for a finite n the uncertainty quantification may differ.  This is true, but similar arguments can be directed at almost any asymptotic analysis, so this doesn’t seem a serious objection to us at least.  We don’t find it surprising that the strong synthetic likelihood assumptions, when accurate, give you something extra in terms of computational efficiency.

We think mixing up the synthetic likelihood/ABC comparison with the comparison between correctly specified and misspecified covariance in Bayesian synthetic likelihood is a bit unfortunate, since these situations are quite different.  The first involves correct uncertainty quantification asymptotically for both methods.  Only a very committed reader who looked at our paper in detail would understand what you say here.  The question we are asking with the misspecified covariance is the following.  If the usual Bayesian synthetic likelihood analysis is too much for our computational budget, can something still be done to quantify uncertainty?  We think the answer is yes, and with the misspecified covariance we can reduce the computational requirements by an order of magnitude, but with an appropriate cost statistically speaking.  The analyses with misspecified covariance give valid frequentist confidence regions asymptotically, so this may still be useful if it is all that can be done.  The examples as you say show something of the nature of the trade-off involved.

We aren’t quite sure what you mean when you are puzzled about why we can avoid having M to be O(√n).  Note that because of the way the summary statistics satisfy a central limit theorem, elements of the covariance matrix of S are already O(1/n), and so, for example, in estimating μ(θ) as an average of M simulations for S, the elements of the covariance matrix of the estimator of μ(θ) are O(1/(Mn)).  Similar remarks apply to estimation of Σ(θ).  I’m not sure whether that gets to the heart of what you are asking here or not.

In our email discussion you mention the fact that if M increases with n, then the computational burden of a single likelihood approximation and hence generating a single parameter sample also increases with n.  This is true, but unavoidable if you want exact uncertainty quantification asymptotically, and M can be allowed to increase with n at any rate.  With a fixed M there will be some approximation error, which is often small in practice.  The situation with vanilla ABC methods will be even worse, in terms of the number of proposals required to generate a single accepted sample, in the case where exact uncertainty quantification is desired asymptotically.  As shown in Li and Fearnhead (2018), if regression adjustment is used with ABC and you can find a good proposal in their sense, one can avoid this.  For vanilla ABC, if the focus is on point estimation and exact uncertainty quantification is not required, the situation is better.  Of course as you show in your nice ABC paper for misspecified models jointly with David Frazier and Juidth Rousseau recently the choice of whether to use regression adjustment can be subtle in the case of misspecification.

In our previous paper Price, Drovandi, Lee and Nott (2018) (which you also reviewed on this blog) we observed that if the summary statistics are exactly normal, then you can sample from the summary statistic posterior exactly with finite M in the synthetic likelihood by using pseudo-marginal ideas together with an unbiased estimate of a normal density due to Ghurye and Olkin (1962).  When S satisfies a central limit theorem so that S is increasingly close to normal as n gets large, we conjecture that it is possible to get exact uncertainty quantification asymptotically with fixed M if we use the Ghurye and Olkin estimator, but we have no proof of that yet (if it is true at all).

Thanks again for being interested enough in the paper to comment, much appreciated.

David, Chris, Robert.

absint[he] post-doc on approximate Bayesian inference in Paris, Montpellier and Oxford

Posted in Statistics with tags , , , , , , , , , , , , , on March 18, 2019 by xi'an

As a consequence of its funding by the Agence Nationale de la Recherche (ANR) in 2018, the ABSint research conglomerate is now actively recruiting a post-doctoral collaborator for up to 24 months. The accronym ABSint stands for Approximate Bayesian solutions for inference on large datasets and complex models. The ABSint conglomerate involves researchers located in Paris, Saclay, Montpelliers, as well as Lyon, Marseille, Nice. This call seeks candidates with an excellent research record and who are interested to collaborate with local researchers on approximate Bayesian techniques like ABC, variational Bayes, PAC-Bayes, Bayesian non-parametrics, scalable MCMC, and related topics. A potential direction of research would be the derivation of new Bayesian tools for model checking in such complex environments. The post-doctoral collaborator will be primarily located in Université Paris-Dauphine, with supported periods in Oxford and visits to Montpellier. No teaching duty is attached to this research position.

Applications can be submitted in either English or French. Sufficient working fluency in English is required. While mastering some French does help with daily life in France (!), it is not a prerequisite. The candidate must hold a PhD degree by the date of application (not the date of employment). Position opens on July 01, with possible accommodation for a later start in September or October.

Deadline for application is April 30 or until position filled. Estimated gross salary is around 2500 EUR, depending on experience (years) since PhD. Candidates should contact Christian Robert (gmail address: bayesianstatistics) with a detailed vita (CV) and a motivation letter including a research plan. Letters of recommendation may also be emailed to the same address.

asymptotics of synthetic likelihood

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , on March 11, 2019 by xi'an

David Nott, Chris Drovandi and Robert Kohn just arXived a paper on a comparison between ABC and synthetic likelihood, which is both interesting and timely given that synthetic likelihood seems to be lacking behind in terms of theoretical evaluation. I am however as puzzled by the results therein as I was by the earlier paper by Price et al. on the same topic. Maybe due to the Cambodia jetlag, which is where and when I read the paper.

My puzzlement, thus, comes from the difficulty in comparing both approaches on a strictly common ground. The paper first establishes convergence and asymptotic normality for synthetic likelihood, based on the 2003 MCMC paper of Chernozukov and Hong [which I never studied in details but that appears like the MCMC reference in the econometrics literature]. The results are similar to recent ABC convergence results, unsurprisingly when assuming a CLT on the summary statistic vector. One additional dimension of the paper is to consider convergence for a misspecified covariance matrix in the synthetic likelihood [and it will come back with a revenge]. And asymptotic normality of the synthetic score function. Which is obviously unavailable in intractable models.

The first point I have difficulty with is how the computing time required for approximating mean and variance in the synthetic likelihood, by Monte Carlo means, is not accounted for in the comparison between ABC and synthetic likelihood versions. Remember that ABC only requires one (or at most two) pseudo-samples per parameter simulation. The latter requires M, which is later constrained to increase to infinity with the sample size. Simulations that are usually the costliest in the algorithms. If ABC were to use M simulated samples as well, since it already relies on a kernel, it could as well construct [at least on principle] a similar estimator of the [summary statistic] density. Or else produce M times more pairs (parameter x pseudo-sample). The authors pointed out (once this post out) that they do account for the factor M when computing the effective sample size (before Lemma 4, page 12), but I still miss why the ESS converging to N=MN/M when M goes to infinity is such a positive feature.

Another point deals with the use of multiple approximate posteriors in the comparison. Since the approximations differ, it is unclear that convergence to a given approximation is all that should matter, if the approximation is less efficient [when compared with the original and out-of-reach posterior distribution]. Especially for a finite sample size n. This chasm in the targets becomes more evident when the authors discuss the use of a constrained synthetic likelihood covariance matrix towards requiring less pseudo-samples, i.e. lower values of M, because of a smaller number of parameters to estimate. This should be balanced against the loss in concentration of the synthetic approximation, as exemplified by the realistic examples in the paper. (It is also hard to see why M could be not of order √n for Monte Carlo reasons.)

The last section in the paper is revolving around diverse issues for misspecified models, from wrong covariance matrix to wrong generating model. As we just submitted a paper on ABC for misspecified models, I will not engage into a debate on this point but find the proposed strategy that goes through an approximation of the log-likelihood surface by a Gaussian process and a derivation of the covariance matrix of the score function apparently greedy in both calibration and computing. And not so clearly validated when the generating model is misspecified.

auxiliary likelihood ABC in print

Posted in Statistics with tags , , , , , , , , on March 1, 2019 by xi'an

Our paper with Gael Martin, Brendan McCabe , David Frazier and Worapree Maneesoonthorn, with full title Auxiliary Likelihood-Based Approximate Bayesian Computation in State Space Models, has now appeared in JCGS. To think that it started in Rimini in 2009, when I met Gael for the first time at the Rimini Bayesian Econometrics conference, although we really started working on the paper in 2012 when I visited Monash makes me realise the enormous investment we made in this paper, especially by Gael whose stamina and enthusiasm never cease to amaze me!

particular degeneracy in ABC model choice

Posted in Statistics with tags , , , , , , on February 22, 2019 by xi'an

In one of the presentations by the last cohort of OxWaSP students, the group decided to implement an ABC model choice strategy based on sequential ABC inspired from Toni et al.  (2008). and this made me reconsider this approach (disclaimer: no criticism of the students implied in the following!). Indeed, the outcome of the simulation led to the ultimate selection of a single model, exclusive of all other models, corresponding to a posterior probability of one in favour of this model. Which sounds like a drawback of the ABC-SMC model choice approach in this setting, namely that it is quite prone to degeneracy, much more than standard SMC, since once a model vanishes from the list, it can never reappear in the following iterations if I am reading the algorithm correctly. To avoid this degeneracy, one would need to keep a population of particles of a given size, for each model, towards using it as a pool for moves at following iterations… Which also means that running in parallel as many ABC-SMC filters as there are models would be equally or more efficient, a wee bit like parallel MCMC chains may prove more efficient than reversible jump for model comparison. (On the trivial side, the OxWaSP seminar on the same day was briefly interrupted by water leakage caused by Storm Eric and poor workmanship on the new building!)