## Semi-automatic ABC

Posted in Statistics with tags , , , , on April 14, 2010 by xi'an

Last Thursday Paul Fearnhead and Dennis Prangle posted  on arXiv a paper proposing an original approach to ABC. I read it rather quickly so I may miss some points in the paper but my overall feeling is of a proximity to Richard Wilkinson‘s exact ABC on an approximate target. The interesting input in the paper is that ABC is considered from a purely inferential viewpoint and calibrated for estimation purposes.

Indeed, Fearnhead and Prangle do not follow the “traditional” perspective of looking at ABC as a converging approximation to the true posterior density. As Richard Wilkinson, they take instead a randomised/noisy version of the summary statistics and derive a calibrated version of ABC, i.e. an algorithm that gives proper predictions, the jinx being that it is for the posterior given this randomised version of the summary statistics. This is therefore a tautological argument of sorts that I will call tautology #1. The interesting aspect of this switch of perspective is that the kernel K used in the acceptance probability

$\displaystyle{ K((s-s_\text{obs})/h)}$

does not have to sound as an estimate of the true sampling density as it appears in the (randomised) pseudo-model. (Everything collapses to the true model when the bandwidth h goes to zero.) The Monte Carlo error is taken into account through the average acceptance probability, which collapses to zero when h goes to zero, therefore a suboptimal choice!

What I would call tautology #2 stems from the comparison of ABC posteriors via a loss function

$(\theta_0-\hat\theta)^\text{T} A (\theta_0-\hat\theta)$

that ends up with the “best” asymptotic summary statistic being

$\mathbb{E}[\theta|y_\text{obs}].$

This follows from the choice of the loss function rather than from an intrinsic criterion… Now, using the posterior expectation as the summary statistics does make sense!  Especially  when the calibration constraint implies that the ABC approximation has the same posterior mean as the  true (randomised) posterior. Unfortunately it is parameterisation dependent and unlikely to be available in settings where ABC is necessary. In the semi-automatic implementation, the authors suggest to use a pilot run of ABC to approximate the above statistics. I wonder at the cost since a simulation experiment must be repeated for each simulated dataset (or sufficient statistic). The simplification in the paper follows from a linear regression on the parameters, thus linking the approach with Beaumont, Zhang and Balding (2002, Genetics).

Using the same evaluation via a posterior loss, the authors show that the “optimal” kernel is uniform over a region

$x^\text{T} A x < c$

where c makes a ball of volume 1. A significant remark is that the error evaluated by Fearnhead and Prangle is

$\text{tr}(A\Sigma) + h^2 \mathbb{E}_K[x^\text{T}Ax] + \dfrac{C_0}{h^d}$

which means that, due to the Monte Carlo error, the “optimal” value of h is not zero but akin to a non-parametric optimal speed in 2/(2+d). There should thus be a way to link this decision-theoretic approach with the one of Ratmann et al. since the latter take h to be part of the parameter vector.