ABC model choice not to be trusted 
On Friday, I received a nice but embarrassing email from Xavier Didelot. He indeed reminded me that I attended the talk he gave at the model choice workshop in Warwick last May, as, unfortunately but rather unsurprisingly giving my short span memory!, I had forgotten about it! Looking at the slides he joined to his email, I indeed remember attending the talk and expecting to get back to the results after the meeting. As I went from Warwick to Paris only to leave a day after for Benidorm, and the Valencia 9 meeting, in such a hurry that I even forgot my current black notebook, the plans of getting back to the talk got forgotten so completely that even reading the tech report (now appeared in Bayesian Analysis) could not rescind them!
Here are some of Xavier’s comments, followed by my answers:
After reading your comments, I still remain hopeful that ABC can be useful for model choice, even when sufficient statistics are not available. We have both found that when the wrong summary statistics are used, the estimated BF can be arbitrarily wrong. But this does not seem so different to me from the situation in ABC for parameter estimation: if the statistics carry no information about the parameters (eg. a sample variance to estimate a mean) then the same situation arises.
This is a point where I cannot avoid some level of disagreement because I feel there is no equivalence between ABC estimation and ABC testing. As in many other instances, the two inferential problems differ immensely!!! When estimating the parameter of a statistical model based on a summary statistic rather than on the whole data, we lose in efficiency, maybe very much, but we very rarely miss consistency. When testing. instead, we may completely miss the right answer and be truly inconsistent. This possibility for inconsistency is the very reason why we posted this warning on arXiv and the ‘Og.
Perhaps our difference of viewpoint comes from our difference of background. As an applied statistician, I often see people following this ad-hoc procedure: (1) choosing some observed summary statistic, (2) trying to find the best fit under a given model M_0, (3) finding that the best fit is not great, (4) considering an improved model M_1, (5) finding a better fit, and (6) concluding than M_1 is superior to M_0. An example of this off the top of my head is in PNAS 2005 102:1968-1973 (but there are literally thousands of examples). Using ABC for model choice seems to me to be just a slightly more formal way of applying the same idea. If the summary statistic used is informative about which model is correct (as is clearly the case for example in the second application of our paper on choice of population size dynamics model) then it can be used to discriminate between models. The calculated quantity is not the true Bayes Factor of course, but only the Bayes Factor assuming that only the summary statistics are observed. In the case of the second application of our paper on choice of population size dynamics model, it would actually be possible to calculate the Bayes Factor based on full data (using rjMCMC techniques) and I would be surprised if the BF were not very similar because I believe as a statistical geneticist that the summary statistics I used are informative about the population size dynamics.
I am 100% in agreement with the main issue that the way most people use ABC is often more empirical and ad hoc than following the formal steps of a testing decision, i.e. that his sequence of six steps concluding with preferring M_1 to M_0 avoids the problem. This is why ABC solutions like Ollie Ratmann‘s or others based on predictive performances would be of huge interest to bypass the delicate use of Bayes factors. However, there are settings where the choice between two models has to be made, in which case I think it should based on a Bayes factor (the corn pest setting comes to mind) and where the decision is truly 0-1 with clearly expressed consequences (of course, some Bayesians will disagree). In such cases, we feel it is necessary to warn users that there was no guaranteed connection between the correct statistical procedure and the procedure approximated by ABC. Obviously, there may/must be examples where the agreement is high, but in a generic situation I do not see a way to assess the agreement or lack thereof.
It therefore seems to me that this is not really an issue about ABC, but with the question of sufficiency of statistics. ABC is not necessarily ABC with summary statistics, there are cases where summary statistics are not required yet ABC is still used (see eg. Toni et al). Conversely, likelihood based inference is sometimes done based on summary statistics. For example, inference of properties of the recombination process in genetics is almost always based on patterns of linkage disequilibrium, but this is only a summary of the data.
Again, something I cannot but agree with: this problem remains outside ABC and is not an ABC issue. Model-based sufficient statistics may have no ability to weight one model versus another, while an ABC algorithm using the whole data for defining proximity would avoid the discrepancy and inconsistency. (This is also a remark made by Michael Stumpf in his comments.)