Locally linear approximations for ABC

A new paper on ABC appeared yesterday on arXiv: Bayesian Computation and Model Selection in Population Genetics, by Christoph Leuenberger, Daniel Wegmann, and Laurent Excoffier, which relates to the local regression ideas in Beaumont et al. (2002).

First and foremost, I quite like the paper and in particular the realisation that sampling from the exact posterior is equivalent to sampling from the “truncated prior” mixed with the truncated model. This is very clever. The approximation to the distribution of the parameters given the observed summary statistics is central to the paper. I find again quite interesting the mixture representation of this approximation.

The criticism of Beaumont et al. (2002) seems a bit unfair, though, because their approach is non-parametric rather than locally linear and because they endeavour to include all simulated summary statistics, even those far away from the observed summary statistic, by shrinking the corresponding parameters. In the current paper, there is no clear shrinkage for those summary statistics that are far away from the observed summary statistic. Indeed, all accepted parameters are weighted similarly in the Gaussian linear approximation to the truncated “prior”. A more general perspective accounting for the unknown distribution of θ given s would certainly be an interesting avenue for exploration, even though the empirical estimate (5) already looks like a non-parametric kernel density estimate, minus the bandwidth!

Similarly, the discussion that one should model s given θ rather than θ given s [page 4] may not be completely convincing. We are not exactly in a statistical framework but, rather, in a simulation framework and conditioning can be done one way or another. Both s and θ should be considered as random variables in this framework and the debate between regular regression and inverse regression maybe could better be resolved through errors in variables. I do not understand why Beaumont et al. (2002) fail to account for the prior, since the parameters are primarily simulated from the prior.

The GLM2 step in the paper sounds at best as a Laplace approximation to the distribution of the sufficient statistics, which may then relate to the recent discussion paper by Rue, Martino and Chopin (2008) in JRSS Series B. Mixing ABC with Laplace approximation sounds like an interesting avenue of research. But using a Gaussian approximation on the sufficient statistic s is not necessarily realistic if for instance s is supported on a finite set.

At several levels of the paper, I have a vague feeling of dual use of the data as in the controversial discussion paper of Aitkin (1991) in JRSS Series B. Indeed, while the truncated prior is called a “prior”, it does depend on the observed summary statistic through the truncation domain. Using again the sampling distribution at this observed summary statistic means that the data is used twice. (Obivously, this does not matter from a computational viewpoint!)

There was a paper by Wilkinson posted earlier on arXiv and discussed in this post that also discusses the impact of the approximation of the posterior, apparently from a different perspective even though I suspect both resolutions can be merged.

At last, I also appreciated the point made in the paper about model choice and the use of the approximation of the normalising constant resulting from the modelling to get to the marginal likelihood and the computation of the Bayes factor. This relates earlier comments in the literature about the ABC acceptance rate approximating the marginal and maybe to a recent paper by Bartolucci, Scaccia and Mira (2006) in Biometrika studying ways of computing marginal probabilities by Rao-Blackwellising reversible jump acceptance probabilities. In Grelaud et al. (2008), we also exploit this ABC feature for Ising models, since an “exact ABC” algorithm is then available for model selection.

This site uses Akismet to reduce spam. Learn how your comment data is processed.