## accelerated ABC

**O**n the flight back from Warwick, I read a fairly recently arXived paper by Umberto Picchini and Julie Forman entitled “Accelerating inference for diffusions observed with measurement error and large sample sizes using Approximate Bayesian Computation: A case study” that relates to earlier ABC works (and the MATLAB abc-sde package) by the first author (earlier works I missed). Among other things, the authors propose an acceleration device for ABC-MCMC: when simulating from the proposal, the Metropolis-Hastings acceptance probability can be computed and compared with a uniform rv *prior to* simulating pseudo-data. In case of rejection, the pseudo-data does not need to be simulated. In case of acceptance, it is compared with the observed data as usual. This is interesting for two reasons: first it always speeds up the algorithm. Second, it shows the strict limitations of ABC-MCMC, since the rejection takes place without incorporating the information contained in the data. (Even when the proposal incorporates this information, the comparison with the prior does not go this way.) This also relates to one of my open problems, namely how to simulate directly summary statistics without simulating the whole pseudo-dataset.

**A**nother thing (related with acceleration) is that the authors use a simulated subsample rather than the simulated sample in order to gain time: this worries me somehow as the statistics corresponding to the observed data is based on the *whole* observed data. I thus wonder how both statistics could be compared, since they have different distributions and variabilities, even when using *the same* parameter value. Or is this a sort of pluggin/bootstrap principle, the true parameter being replaced with its estimator based on *the whole* data? Maybe this does not matter in the end (when compared with the several levels of approximation)…

October 29, 2013 at 1:21 pm

Hi Christian,

With respect to your second point that “the rejection takes place without incorporating the information contained in the data”, this is a fundamental part of why the 1-hit algorithm analyzed with Krzysztof Łatuszyński has conditions on the “interplay between the likelihood and the prior-proposal pair”. This was mentioned in ABC in Rome (talk is available below) and we are more explicit in Remark 4 of http://arxiv.org/abs/1210.6703 where we point out that the Metropolis-Hastings counterpart would not require such conditions as it always evaluates the product of the prior and the likelihood. In particular, for a random walk proposal, ABC-MCMC can be in trouble when the prior and the likelihood are in extreme disagreement but their product has exponential or lighter tails.

https://sites.google.com/site/approxbayescompinrome/program/speakers

Of course, while this is an identifiable issue, it is not clear (at least to me) if people will often have priors and likelihoods that conflict enough for this to be important in practical applications.

I remember briefly discussing with Umberto in Rome about “early rejection”, as during my talk I had (perhaps wrongly) just assumed that people implementing Marjoram et al.’s ABC-MCMC kernel would use early rejection, at least in models where simulating the data is expensive.

October 19, 2013 at 1:48 am

Which is an expected effect given the Bernstein-von-Mises theorem. Similarly, in a Bootstrap context, a Bootstrap sample has to be the same size as the original sample.

October 18, 2013 at 5:13 pm

Yes using a subsampled summary statistic can make ABC approximations even less precise. A simple example (devised by Tony Pettitt and Rob Reeves) in my thesis pg 28-29 available from (http://eprints.qut.edu.au/57317/) gives some insight into the approximation. Conversely, simulating more data than what is observed can make ABC over-precise.

October 18, 2013 at 10:49 pm

Thanks, Chris. This subsampling idea sounded a wee bit dangerous to me…

October 21, 2013 at 1:55 am

At the risk of making a “dangerous” suggestion, if one knew how imprecise the ABC posterior was in the first place, then simulating “enough” extra data could balance the precision of the ABC posterior to be close to that of the true posterior.

No I have no idea how to do this. A project for someone :-)

October 17, 2013 at 11:51 am

Thanks to all who commented and especially to Christian for writing the post (and for reading the paper!).

There are 2 accelerations involved in the first linked paper (with J. Forman): the first one is due to an “early rejection” approach and the second one due to a “subsampling” procedure. I want to emphasize (although Christian has been already explicit in this sense) that early-rejection is due to a previous paper (denoted as “earlier ABC work” in the post), and early-rejection is completely independent of “subsampling”, i.e. it can be applied regardless of subsampling. The subsampling acceleration is not considered/suggested as a general procedure (of course) but it makes sense in the specific context of the proposed large-data application, where observations follow a fairly regular pattern.

As pointed out by Scott (thanks!), I do agree that subsampling must introduce a loss of precision as “subsampled statistic is able to reproduce the observed statistic (itself subsampled or otherwise) from a wider range of parameter values.” but also that, following Christian, in the end there are so many levels of approximation in ABC that from a practical point of view this might be a minor issue for the considered application (although relevant for the theory).

I would also like to comment on Christian’s remark that early-rejection ” always speeds up the algorithm”: as pointed-out in the earlier paper (the one without subsampling) this is not necessarily the case, under specific circumstances, for example when a uniform kernel is taken and when uniform priors are placed on ALL unknowns. In such case the acceptance-ratio for the early-rejection step would equal 1 and therefore the simulated uniform r.v. cannot exceed such ratio (thus early-rejection never takes place). In my experiments (and in my abc-sde package — btw the correct link is http://sourceforge.net/projects/abc-sde/) the ABC-MCMC targets the augmented posterior for (\theta,\varepsilon) and an exponential prior is considered for the ABC tolerance \varepsilon, so this extreme situation does not occur even when a uniform prior is set on \theta.

October 17, 2013 at 3:15 am

“On the flight back from Warwick” amused me, since Warwick (well, Coventry really) doesn’t have an airport!

October 17, 2013 at 10:05 pm

you never know, maybe I booked an helicopter…?!

October 17, 2013 at 1:14 am

Addendum to my first point: Actually, this can be made to work (sort of) with a general (non-uniform) kernel by initially computing the acceptance probability evaluated at the maximum part of the kernel (which might as well take the value 1). If the proposal is rejected, then the data does not need generation. If the proposal is “accepted”, the pseudo-data is generated and a second acceptance probability (essentially taking the value of the kernel, normalised with maximum value of 1) evaluated to decide whether to finally accept the proposal.

This has the same efficiency as the uniform kernel in the first stage, but potentially better mixing properties as a result of the second stage.

October 17, 2013 at 10:07 pm

Thanks, Scott: given that the ratio of the kernels appears in the acceptance probability, you would need an upper bound on the ratio of the kernels, don’t you? Which would be either unrealistic or highly conservative….

October 17, 2013 at 1:06 am

Regarding accepting/rejecting an MCMC proposal before (potentially) generating the pseudo-dataset, presumably this only works if one uses a uniform kernel function for comparing true and pseudo-datasets. This MCMC sampler setup doesn’t perform well (in terms of mixing), although the gain in speed may be enough to offset this. So this is basically the question that the user needs to resolve. (And presumably the same idea works for all ABC algorithms – rejection sampling, SMC etc.) In the more general case, with an arbitrary (non-uniform) kernel, or where multiple pseudo-datasets are generated (a la pseudo-marginal ABC), this trick obviously won’t work.

Regarding the subsampled summary statistics, intuition would suggest that this would produce a loss of precision in the posterior, as the more variable subsampled statistic is able to reproduce the observed statistic (itself subsampled or otherwise) from a wider range of parameter values.

October 17, 2013 at 12:53 am

Regarding your open question, I have a vague memory of Chris Holmes presenting something like that at a workshop in Bristol last year (that I think you were at). It was for discrete hidden Markov models and he managed to compute something like the posterior expectation of an arbitrary loss function in linear (or log-linear) time. But then again, I may be mis-remembering…

October 17, 2013 at 10:09 pm

Dan, do you mean this paper? This may prove too much of an approximation for mainstream statisticians. (Or not, if proven to be the only manageable solution!) Btw, it was great seeing you last week in Warwick!

October 22, 2013 at 2:46 am

Actually no (I saw Stephen Walker talk about that one in Duke, where it was not as well received as I would’ve expected [although there are big cultural differences between countries...]). This one was much more in the spirit of estimating discrete HMMs, but with the focus on estimating loss functions rather than parameters or smoothing (it was a fowards-backwards algorithm, although I can’t seem to find a paper).

Incidentally, Chris H gave a great talk here last weekend at our annual workshop on new work that really extended the paper you linked. A pretty facile summary would be “All models are wrong, so what are we going to do about it?”!!!