## ABC model choice not to be trusted

This may sound like a paradoxical title given my recent production in this area of ABC approximations, especially after the disputes with Alan Templeton, but I have come to the conclusion that ABC approximations to the Bayes factor are not to be trusted. When working one afternoon in Park City with Jean-Michel and Natesh Pillai (drinking tea in front of a fake log-fire!), we looked at the limiting behaviour of the Bayes factor constructed by an ABC algorithm, ie by approximating posterior probabilities for the models from the frequencies of acceptances of simulations from those models (assuming the use of a common summary statistic to define the distance to the observations). Rather obviously (a posteriori!), we ended up with the true Bayes factor based on the distributions of the summary statistics under both models!

At first, this does not sound a particularly novel and fundamental result, since all ABC approximations rely on the posterior distributions of those summary statistics, rather than on the whole dataset. However, while this approximation only has consequences in terms of the precision of the inference for most inferential purposes, it induces a dramatic arbitrariness in the Bayes factor. To illustrate this arbitrariness, consider the case of using a sufficient statistic S(x) for both models. Then, by the factorisation theorem, the true likelihoods factorise as

$\ell_1(\theta_1|x) = g_1(x) p_1(\theta_1| S(x)) \quad\text{and}\quad \ell_2(\theta_2|x) = g_2(x) p_2(\theta_2| S(x))$

resulting in a true Bayes factor equal to

$B_{12}(x) = \dfrac{g_1(x)}{g_2(x)}\,B^S_{12}(x)$

where the last term is the limiting ABC Bayes factor. Therefore, in the favourable case of the existence of a sufficient statistic, using only the sufficient statistic induces a difference in the result that fails to converge with the number of observations or simulations. On the opposite, it may diverge one way or another as the number of observations increases…  (This is the point in the above illustration, taken from the arXived paper, the true Bayes factor corresponding to the first axis and the ABC approximation to the second, based on 50 observations from either Poisson (left) or geometric (right).) Again, this is in the favourable case of sufficiency. In the realistic setting of using summary statistics, things deteriorate further! This practical situation indeed implies a wider loss of information compared with the exact inferential approach, hence a wider discrepancy between the exact Bayes factor and the quantity produced by an ABC approximation. It thus appears to us an urgent duty to warn the community about the dangers of this approximation, especially when considering the rapidly increasing number of applications using ABC for conducting model choice and hypothesis testing. Furthermore, we unfortunately do not see an immediate and generic alternative for the approximation of Bayes factor. The only solution seems to be using discrepancy measures as in Ratmann et al. (2009), ie (empirical) model criticism rather than (decision-theoretic) model choice.

### 18 Responses to “ABC model choice not to be trusted”

1. […] the very exciting and I think quite successful ABC in Paris meeting two years ago, Michael Stumpf from Imperial College London suggested a second edition in London along the same lines. Michael […]

2. […] École Polytechnique (X) on random models for ecology, genetics and evolution. The first one is on ABC,  Approximate Bayesian Computations Done Exactly,  by Razeesh Shainudin and I plan to attend. […]

3. […] Gaussian Models in Zurich next Saturday, in obvious connection with the recent arXiv posting and the three posts about ABC model choice. Although there is nothing really Gaussian in the talk, I hope I will get […]

4. Scott Sisson Says:

I’d have to agree with some of the above comments – very evidently sufficient statistics for individual models are unlikely to be very informative for the model probability. This is already well known and understood by the ABC-user community. This is also why focus on model discrimination typically either proceeds by evaluation of the fit of a model on its own individual merits, or by accepting that the Bayes Factor that one obtains is only derived from the summary statistics and may in no way correspond to that of the full model.

So I’d agree with Christian that “this does not sound a particularly novel and fundamental result,” [paraphrasing…] and accordingly the statistical content of this article as it stands is quite weak. I’d suggest that this paper be revised into a more substantial data-analysis article (that is, where primary focus is on the analysis, not the Bayes Factor “story”), and then interweave the above content into it. That is, the current content could be a pertinent footnote to a rather more interesting article than is presently the case.

• I am very glad the ABC-user community is aware of this fact, making our “story” a footnote. However I am wondering then why Bayes factors relying on ABC output keep being used in published papers.

5. […] This is a point where I cannot avoid some level of disagreement because I feel there is no equivalence between ABC estimation and ABC testing. As in many other instances, the two inferential problems differ immensely!!! When estimating the parameter of a statistical model based on a summary statistic rather than on the whole data, we lose in efficiency, maybe very much, but we very rarely miss consistency. When testing. instead, we may completely miss the right answer and be truly inconsistent. This possibility for inconsistency is the very reason why we posted this warning on arXiv and the ‘Og. […]

6. I just stumbled across your site. Do you have a link to your paper available on ABC model selection?

• Basil: The current paper is arXiv:1101.5091 and the earlier paper on Gibbs random fields is discussed in this post and is arXiv:0807.2767. Links to other papers (Toni and Stumpf) can be found in older posts…

7. TheoSysBio Says:

Dear Xi’an,
this is a vexing problem that you and your colleagues have found.

However, I think that the situation is not always as bad: in particular when you do ABC without summary statistics (which lead to loss of information) model selection should still be fine. In particular when time-series data is used to discriminate between candidate models, as in the work of Toni et al and Toni and Stumpf , it is possible to use the data directly in an ABC framework (see also Sousa et al. for a similar approach in population genetics).

But interestingly, there should also be the possibility that for the same model, but different (non-minimal) sufficient statistics (so different $\eta$‘s: $\eta_1$ and $\eta_1^*$) the ratio of evidences may no longer be equal to one, although we have only tested one model against itself (but using different sufficient statistics).

• Those are both interesting points:
1. Using the whole data, whenever possible as in Sousa et al., is bypassing the difficulty,
2. Using two different collections of sufficient statistics may provide an indicator about the trust one can place in a Bayes factor evaluated by ABC. If the range of values is small, they are all providing the same level of information; if not some are more reliable than others. This should be explored further.

• I thought further about your second point: I do not think comparing Bayes Factors based on two different statistics can operate because in this case there is necessarily a different term in front of the ABC likelihood because of the factorisation theorem… Even in cases when those two statistics are both perfectly sufficient for model comparison.

8. Hi Christian,
Interesting although I am not sure how relevant it is. Let us denote by M an indicator variable, M=1,2, where 1 corresponds to model 1 and 2 corresponds to model 2. Let us denote by theta_1 the parameter(s) of model 1 and theta_2 the parameter(s) of model 2.

If you care about model comparison, you are looking for a statistic eta that is sufficient w.r.t. the parameter M. You are not looking for a statistic that is sufficient w.r.t. to theta_1 or theta_2. If you have found a statistic eta that is sufficient w.r.t M then
f(y given M)= g(y) h(eta(y) given M),
and the renormalization term cancels when computing the Bayes factor.

Sincerely
Michael

• This is exactly the point we are making in both of our papers. There is a different sufficiency for model choice (inter-model) and for point estimation (intra-model). We are therefore not disputing the mathematical validity of the cancellation, obtained by an encompassing argument in Didelot et al., rather pointing out that selecting a summary or even a sufficient statistic that is the same across models [the customary approach in ABC model choice] is not providing a trustworthy figure. This is clearly relevant.

9. […] Xi'an's Og an attempt at bloggin, from scratch… « ABC model choice not to be trusted […]

10. Hi Xian, I can’t understand what the graphs at the top represent. What are the x- and y-axes and what model are you fitting here? Thanks

• The graph is taken from the arXived paper. We compare the (ln-)Bayes factor based on the sufficient statistics (second axis) with the (ln-)Bayes factor based on the whole data (first axis). The data is either generated from a Poisson (left) or from a geometric (right) distribution.

This site uses Akismet to reduce spam. Learn how your comment data is processed.