summary statistics for ABC model choice

countryside near Kenilworth, England, March 5, 2013A few days ago, Dennis Prangle, Paul Fernhead, and their co-authors from New Zealand have posted on arXiv their (long-awaited) study of the selection of summary statistics for ABC model choice. And I read it during my trip to England, in trains and planes, if not when strolling in the beautiful English countryside as above.

As posted several times on this ‘Og, the crux of the analysis is that the Bayes factor is a good type of summary when comparing two models, this result extending to more model by considering instead the vector of evidences. As in the initial Read Paper by Fearnhead and Prangle, there is no true optimality in using the Bayes factor or vector of evidences, strictly speaking, besides the fact that the vector of evidences is minimal sufficient for the marginal models (integrating out the parameters). (This was a point made in my discussion.) The implementation of the principle is similar to this Read Paper setting as well: run a pilot ABC simulation, estimate the vector of evidences, and re-run the main ABC simulation using this estimate as the summary statistic. The paper contains a simulation study using some of our examples (in Marin et al., 2012), as well as an application to genetic bacterial data.

As indicated above, I was definitely looking forward the publication of this paper, both because of the importance of the problem and because I wanted to see how it performed against other ABC solutions. That the Bayes factor was acceptable as a statistic was quite natural in terms of our consistency result as it is converging to 0 and to ∞ depending from which model the data is generated. The paper is well-written and clear enough to understand how the method is implemented. It also provides a very fair coverage of our own paper. However, I do not understand several points. For one thing, given that the vector of evidence is the target, I do not see why the vector of Bayes factors for all pairs is used instead, leading to a rather useless inflation in the dimension of the summary statistic. Using a single model for the denominator would be sufficient, enough I mean  (What am I missing?!)

Somehow in connection with the above, the use of the logistic regularisation for computing the posterior probability (following an idea of Marc Beaumont in the mid 2000’s) is interesting but difficult to quantify. I mean, using a logistic regression based on the training sample sounds like the natural solution to compute the sufficient statistic, however the construction of the logistic regression by regular variable selection techniques means that different transforms of the data are used to compare different models, an issue that worries me (see again below). Obviously, the overall criticism on the Read Paper that the quality of the outcome ultimately depends on the choice of the first batch of statistics still applies: too many statistics and there is no reason to believe in the quality of the ABC, too few statistics and there is no reason to trust the predictive power of the logistic regression.

The authors also introduce a different version of the algorithm where they select a subregion of the parameter space(s) during the pilot run and replace the prior with the prior restricted to that region during the main run. The paper claims significant improvements brought by this additional stage, but it makes me a wee uneasy: For one thing, it uses the data twice, with a risk of over-concentration. For another, I do not see how the restricted region could be constructed, esp. in large dimensions (an issue I had when using HPD regions for harmonic mean estimators), apart from the maybe inefficient hypercube. For yet another (maybe connected with the first thing!), a difference between models is induced by this pilot run restriction, which amounts to changing the prior weights of the models under comparison.

A side remark in the conclusion suggests using different vectors of statistics in a pairwise comparison of models. While I have also been tempted by this solution, because it produces a huge reduction in dimension, I wonder at its validation, as it amounts to comparing models based on different (transforms of) observations, so the evidences are not commensurable. I however agree with the authors that using a set of summary statistics to run ABC model comparisons and another one to run ABC estimation for a given model sounds like a natural approach, as it fights the curse of dimensionality.

One Response to “summary statistics for ABC model choice”

  1. HI Christian, thanks for the thoughtful comments! Some replies are below.

    On using regressions for each pair of models, the argument is that this will increase robustness. If the Bayes factors between a few pairs of models are captured poorly, then the others may often still be enough to approximate sufficient statistics. For problems with many (>3?) models this would certainly produce too many summaries and a multinomial regression-like approach would be preferable.

    You mention the restricted regions causing a change to the prior model weights. We attempt to make a correction for this in the “Truncation Correction” section on page 10.

    On alternatives to hypercubes for restricted regions, I am toying with using a HPD region for a normal or mixture of normals approximation to the posterior. You raise a good point about the difficulties of extending these to high dimensional problems.

    You mention a couple of criticisms that are common to the RSS B paper: dependence of the method on how the pilot stage is performed and the problem of (weakly) using the data twice. These are still very much valid criticisms, and the same heuristic justifications as before as used in this paper.

    I’m completely in agreement with your closing comments on the potential problems of comparing models based on different vectors of statistics. I didn’t consider that the same issue could arise from using regularised regression. Perhaps L2 regularisation would be preferable here (some regularisation is needed in the main applicaiton to avoid ill-conditioning).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.