## lazy ABC

“A more automated approach would be useful for lazy versions of ABC SMC algorithms.”

**D**ennis Prangle just arXived the work on lazy ABC he had presented in Oxford at the i-like workshop a few weeks ago. The idea behind the paper is to cut down massively on the generation of pseudo-samples that are “too far” from the observed sample. This is formalised through a stopping rule that puts the estimated likelihood to zero with a probability 1-α(θ,x) and otherwise divide the original ABC estimate by α(θ,x). Which makes the modification unbiased when compared with basic ABC. The efficiency appears when α(θ,x) can be computed much faster than producing the entire pseudo-sample and its distance to the observed sample. When considering an approximation to the asymptotic variance of this modification, Dennis derives a optimal (in the sense of the effective sample size) if formal version of the acceptance probability α(θ,x), conditional on the choice of a “decision statistic” φ(θ,x). And of an importance function g(θ). (I do not get his Remark 1 about the case when π(θ)/g(θ) only depends on φ(θ,x), since the later also depends on x. Unless one considers a multivariate φ which contains π(θ)/g(θ) itself as a component.) This approach requires to estimate

as a function of φ: I would have thought (non-parametric) logistic regression a good candidate towards this estimation, but Dennis is rather critical of this solution.

**I** added the quote above as I find it somewhat ironical: at this stage, to enjoy laziness, the algorithm has first to go through a massive calibration stage, from the selection of the subsample [to be simulated before computing the acceptance probability α(θ,x)] to the construction of the (somewhat mysterious) decision statistic φ(θ,x) to the estimation of the terms composing the optimal α(θ,x). The most natural choice of φ(θ,x) seems to be involving subsampling, still with a wide range of possibilities and ensuing efficiencies. (The choice found in the application is somehow anticlimactic in this respect.) In most ABC applications, I would suggest using a quick & dirty approximation of the distribution of the summary statistic.

**A** slight point of perplexity about this “lazy” proposal, namely the static role of ε, which is impractical because not set in stone… As discussed several times here, the tolerance is a function of many factors incl. all the calibration parameters of the lazy ABC, rather than an absolute quantity. The paper is rather terse on this issue (see Section 4.2.2). It seems to me that playing with a large collection of tolerances may be too costly in this setting.

June 9, 2014 at 11:17 am

Thanks very much for your comments on this paper Christian. Here are a couple of thoughts:

Remark 1: Yes I’m thinking of the decision statistics being a vector which may include pi(theta)/g(theta) as a component.

Non-parametric logistic regression: I agree this is a promising approach. Perhaps this doesn’t come over enough in the paper where I criticise both the approaches I suggest!

Calibration stage: In the paper’s application I try to get as large an increase in efficiency as possible. The cost is that the calibration stage is quite complex. In practice a “quick and dirty approximation of the summary statistic”, or choosing between a few of these, may often result in a good speed-up for less work. Perhaps I should add an example which focuses on ease of use rather than maximising efficiency.

Choice of epsilon: Section 4.2.5 discusses how to make a choice of epsilon after the algorithm has been run. The idea is to tune the algorithm based on some rough guess epsilon1 and then reduce epsilon afterwards. This means epsilon is not completely static, but you do need some rough idea of its value in advance. One way of getting this is to base it on the calibration stage simulations.

Anticlimatic results: An interesting question is in what sort of situation (if any) can a lazy ABC-like approach can produce a really big speed-up. This requires a lot of simulations which can be cheaply detected to be “bad”. In the paper’s application this is the case (to some extent) when the parameter proposal distribution is poor, but the improvements are less for a reasonable proposal. I wonder if there are many models where a lot of simulations “go bad fast” even for parameters with large posterior densities.