During a (great) visit to London over the weekend, I was shown a fairly curious paper written by Alan Templeton that basically rejects the use of ABC for hypothesis testing, in favour of Templeton’s own approach, called “nested clade phylogeographical analysis” (NCPA). Since I read the paper in the Eurostar this afternoon, I could not get further information about this approach but most of the criticisms against ABC contained in the paper can be understood on their own.
First, despite its title, this paper is mostly oriented against Bayesian hypothesis testing, confusing the simulation method (ABC) with the underlying Bayesian inferential method of using Bayes factors. Sentences like “the ABC method posits two or more altermative hypotheses and tests their relative fits to some observed statistics”, “no novelty or discovery is allowed in ABC” and “in ABC there is no null hypothesis” illustrate this confusion. From this perspective, the criticisms in the paper are those usually found against the standard Bayesian (i.e. Jeffreys’) approach to testing of hypotheses, like the impossibility of testing the null hypothesis per se, the relativity of Bayes factors, the dependence on prior scenarios, the ignorance of the sampling distribution (or more exactly the lack of integrating over the distribution of the sample), the bias towards over-parameterised models, the impossibility of determining the false-positive rate, all items that have been addressed by the literature in many discussions and will not be repeated here. (There is also the intervention of Karl Popper’s falsifiability that always makes me wary of the statistics papers or books calling for it.) The paper concludes by stating that “the ‘posterior probabilities’ that emerge from ABC are not co-measurable”, which means that they are not to be measured on the same scale but this is bypassing the very nature of Bayesian model choice which compares the posterior probabilities of models by turning the marginal likelihoods into probabilities (see again Jeffreys).
Some criticisms are however correctly directed at ABC as an approximation method, but I also find difficulties with most of them. First, Templeton considers ABC (but in fact the Bayes Factor) to be a goodness-of-fit criterion because it depends on a distance
between a simulated statistics and the observed statistic. Again, this is confusing Bayesian inference (which relies on a marginal likelihood) and simulation technology (which approximates the event
based on this distance). In connection, a second criticism is that the missing “dimensionality of the models” invalidates the inference based on the ABC method, again missing the point that ratios of marginal likelihoods are directly comparable and that they are not chi-squared goodness-of-fit statistics. (Templeton introduces the notion of co-measurability to make the criticism sound more rigorous but this is a concept I have never heard used in Statistics and anyway it does not apply here.) A third attack is more puzzling in that it mixes both simulation and inference and observables and parameters: Fig. 3 in the paper plots three “posterior distributions” (densities) corresponding to three models under comparison but uses a sufficient statistic s to index the first axis. The argument then goes as follows: since ABC only considers statistics s’ such that
is small, it is missing the big picture (and is not Bayesian inference either)! This does not make sense, especially when considering that ABC is not longer A(pproximative) when this distance is equal to zero. It repeatedly confuses the simulation of the auxiliary sufficient statistics (in the space of the observables) and the Bayesian inference (that is on principle unrelated with the simulation method!). The fourth argument against ABC is that there is no convergence result in Beaumont et al. (2002), especially about the choice of
and the paper calls to Billingsley (1986) himself for support. This is [again] rather off-the-point since the convergence of the method is a Monte Carlo type of convergence that has nothing to do with “the impact of the sample size”. When
goes to zero, the method always converges. If one wants to consider things a bit deeper, for a given Monte Carlo sample size, Beaumont et al.’s (2002) uses a non-parametric conditional expectation which also converges as the Monte Carlo sample size goes to infinity. Convergence is thus not addressed in the original papers because it is rather obvious.
I thus found this reading quite entertaining, both because of the repeated occurrence of severe criticisms of Bayesian hypothesis testing and because it happened to occur in a specific field, phylogeny, as it occurred before in other fields like astronomy. The additional appeal of the paper was the confusion between inference and simulation-based approximations, also found in earlier criticisms of MCMC for instance. (Note that there is a strong debate in this community about NCPA itself, as in Petit (2007) and Beaumont and Panchal (2008), but that this is not the purpose of this post!) This means to me that (a) we need to do a better job of explaining the fundamentals of Bayesian hypothesis testing and (b) users of Statistics rarely have the luxury of getting to the original concepts but most often derive them from previous applications in their own field, with the inherent dangers…
Like this:
Like Loading...