## Defence of model-based inference

Posted in Statistics, University life with tags , , , , , on January 13, 2010 by xi'an

A tribune—to which I contributed—about the virtues of statistical inference in phylogeography  just appeared in Molecular Ecology. (The whole paper seems to be available on line as I can access it.) It has been written by 22 (!) contributors in response to Templeton’s recent criticism of ABC (and his defence of Nested Clade Analysis) in the same journal. My contribution to the paper is mostly based on the arguments posted here last March, namely that the paper was confusing ABC (which is a computational method) with Bayesian statistics. The paper as a whole goes beyond a “Bayesian defence” since not all authors are Bayesian. It supports a statistics based approach to phyleogeography, as reported in the abstract

Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics.

This will most presumably fail to end the debate between the proponents and the opponents of model-based inference in phylogenics and elsewhere, but the point was worth making…

## Not so Fooled by Randomness

Posted in Books, Statistics with tags , on March 11, 2009 by xi'an

“Why do I want everybody to learn some statistics?” (p.215)

After reading and commenting on The Black Swan, I decided to spend the lavish stipend provided by my Associate gains on Amazon on Fooled by Randomness by Nassim Nicolas Taleb, in connection with another positive review by Andrew Gelman. Obviously, after being put off by The Black Swan, I started Fooled by Randomness with a strong bias, not helped by the fact that the book is written in almost exactly the same infuriating style, with endless repetitions and ceaseless discursions. From the prologue, “this book has two purposes: to defend science (…) and to attack the scientist when he strays from his course” which sounds fairly ambitious a priori and not achieved a posteriori. The style is however both less egocentric than The Black Swan and more scientific in that those “fooled by randomness” mostly are those not accounting for randomness, rather than those using (inadequate) random models as in The Black Swan. I thus had less of a hard time reading it in the metro (took me less than a week!), despite the author’s aggravating style using small facts, key figures and personal introspections to advance one’s theory. As in The Black Swan, Karl Popper once again pops in even before page one! And the baseball coach Yogi Berra familiar to readers of The Black Swan is not far behind. The overlap with The Black Swan is far from negligible, the notion of black swans is already repeated in Fooled by Randomness many times, with the initial metaphor about the unpredictable black swan attributed to Hume, as are trader stories and barbs at econometricians, Nobel Prize winners, bell curves, financial experts and most economists (but much less at the bell curve, Frenchmen, textbooks and statisticians).

“These “thinkers” should be given an undergraduate class on statistical sampling…” (p. 75)

As written above, the main difference with The Black Swan is however that the tone is not antagonistic towards probability theory, quite the opposite, and there is even a chapter that is an ode to Monte Carlo mathematics(!). I am not convinced this Monte Carlo chapter makes much sense to anyone who has never heard of Monte Carlo computer simulation, though, because there hardly is any mention of an underlying model driving the simulation. (Simulating Russian roulette on a computer does not sound much appealing either to the lambda reader.) Part I also contains sensible warnings against the survivor bias and similar issues. The warnings about apparent coincidences and other apparently unlikely events are correctly argumented (including a reference to Persi Diaconis) and are the central theme of the book, finding patterns where there is none. Another improvement [when compared with The Black Swan] is that non-stationarity and regime switch are explicitely recognised as a cause for poor prediction. There is nonetheless an underlying argument that statistical inference (learning from experience) is in essence impossible on most real phenomema, because there is basically no way to check (or “to falsify” in Popper’s lingua) that your model is completely correct, especially in the tails. While this is done in a most obscure way, there even are favourable references to sujective probabilities, to priors and not so obscure to Keynes’ A Treatise On Probability. (Made me think of using this book for an historical reading course next year!)

In conclusion, and in my opinion, the author should not have written The Black Swan after this mostly reasonable (if highly repetitive) book! The excessively aggressive tone adopted against modelling in The Black Swan makes Fooled by Randomness appear almost like its opposite at times. (I will discuss in a later post some minor points of contention I have with the book.)

Some criticisms are however correctly directed at ABC as an approximation method, but I also find difficulties with most of them. First, Templeton considers ABC (but in fact the Bayes Factor) to be a goodness-of-fit criterion because it depends on a distance$|| s'-s||$between a simulated statistics and the observed statistic. Again, this is confusing Bayesian inference (which relies on a marginal likelihood) and simulation technology (which approximates the event$s'=s$based on this distance). In connection, a second criticism is that the missing “dimensionality of the models” invalidates the inference based on the ABC method, again missing the point that ratios of marginal likelihoods are directly comparable and that they are not chi-squared goodness-of-fit statistics. (Templeton introduces the notion of co-measurability to make the criticism sound more rigorous but this is a concept I have never heard used in Statistics and anyway it does not apply here.) A third attack is more puzzling in that it mixes both simulation and inference and observables and parameters: Fig. 3 in the paper plots three “posterior distributions” (densities) corresponding to three models under comparison but uses a sufficient statistic s to index the first axis. The argument then goes as follows: since ABC only considers statistics s’ such that$|| s'-s||$is small, it is missing the big picture (and is not Bayesian inference either)! This does not make sense, especially when considering that ABC is not longer A(pproximative) when this distance is equal to zero. It repeatedly confuses the simulation of the auxiliary sufficient statistics (in the space of the observables) and the Bayesian inference (that is on principle unrelated with the simulation method!). The fourth argument against ABC is that there is no convergence result in Beaumont et al. (2002), especially about the choice of$\delta$and the paper calls to Billingsley (1986) himself for support. This is [again] rather off-the-point since the convergence of the method is a Monte Carlo type of convergence that has nothing to do with “the impact of the sample size”. When$\delta$goes to zero, the method always converges. If one wants to consider things a bit deeper, for a given Monte Carlo sample size, Beaumont et al.’s (2002) uses a non-parametric conditional expectation which also converges as the Monte Carlo sample size goes to infinity. Convergence is thus not addressed in the original papers because it is rather obvious.