It took a while and a trip to Warwick and a call to Fubini to understand, but I now see why there is not specific issue with the insufficiency of the summary statistics!

]]>Thank you for the detailed comments, Dennis! My worry about the summary statistics is that, while the overall simulation is on the joint for the parameters and for the (full) observations, the ABC is built upon summary statistics, so there seems to be a discrepancy there.

]]>“Given that ABC is always wrong, however, this may fail to be a powerful diagnostic” This is an interesting point. Our method can often detect ABC approximation error, which is why we promote it as a useful technique. However it doesn’t estimate how much error remains if our diagnostics do not reject.

We only look at approximation error based on the ABC threshold, rather than that due to summary statistics. In other words we compare the ABC posterior approximation with pi(theta | S(yobs)) rather than with pi(theta | yobs). Possible extensions to consider the effect of summary statistics are mentioned in the discussion, but there would clearly be a lot of extra challenges.

The coverage property and calibration are closely linked. We didn’t explore this in the literature review to keep it brief. Calibration as in Fearnhead and Prangle (2012) effectively uses pi(theta0,y0)=prior*likelihood so tolerances zero and infinity both give calibration, but usually not intermediate values (except in the case of noisy ABC, which we don’t consider in the current paper). One contribution of this paper is to eliminate the “false negative” of the prior.

It’s correct that our definition of coverage does not involve the posterior. The link to the posterior only comes out through Result 1.

The comment about a “70% credible interval that m=1” does refer to a degenerate interval. The statement is trying to be an informal (technically incorrect!) link to the earlier definition of coverage.

In going from equation (5) to the test on page 13, the idea is that we have estimated model probabilities from the various data sets, and now investigate if the true models are consistent with this distribution. Statistic W does this through the log likelihood whereas statistics U and V simplify to a Bernoulli setting, by choosing one model i and comparing only the cases “i” and “not i”.

]]>