## About Fig. 4 of Fagundes et al. (2007)

Posted in R, Statistics, University life with tags , , , , , , , , on July 13, 2011 by xi'an

Yesterday, we had a meeting of our EMILE network on statistics for population genetics (in Montpellier) and we were discussing our respective recent advances in ABC model choice. One of our colleagues mentioned the constant request (from referees) to include the post-ABC processing devised by Fagundes et al. in their 2007 ABC paper. (This paper contains a wealth of statistical innovations, but I only focus here on this post-checking device.)

The method centres around the above figure, with the attached caption

Fig. 4. Empirical distributions of the estimated relative probabilities of the AFREG model when the AFREG (solid line), MREBIG (dashed line), and ASEG (dotted line) models are the true models. Here, we simulated 1,000 data sets under the AFREG, MREBIG, and ASEG models by drawing random parameter values from the priors. The density estimates of the three models at the AFREG posterior probability = 0.781 (vertical line) were used to compute the probability that AFREG is the correct model given our observation that PAFREG = 0.781. This probability is equal to 0.817.

which aims at computing a p-value based on the ABC estimate of the posterior probability of a model.

I am somehow uncertain about the added value of this computation and about the paradox of the sentence “the probability that AFREG is the correct model [given] the AFREG posterior probability (..) is equal to 0.817″… If I understand correctly the approach followed by Fagundes et al., they simulate samples from the joint distribution over parameter and (pseudo-)data conditional on each model, then approximate the density of the [ABC estimated] posterior probabilities of the AFREG model by a non parametric density estimate, presumably density(), which means in Bayesian terms the marginal likelihoods (or evidences) of the posterior probability of  the AFREG model under each of the models under comparison. The “probability that AFREG is the correct model given our observation that PAFREG = 0.781″ is then completely correct in the sense that it is truly a posterior probability for this model based on the sole observation of the transform (or statistic) of the data x equal to PAFREG(x). However, if we only look at the Bayesian perspective and do not consider the computational aspects, there is no rationale in moving from the data (or from the summary statistics) to a single statistic equal to PAFREG(x), as this induces a loss of information. (Furthermore, it seems to me that the answer is not invariant against the choice of the model whose posterior probability is computed, if more than two models are compared. In other words, the posterior probability of the AFREG model given the sole observation of PAFREG(x). is not necessarily the same as the posterior probability of the AFREG model given the sole observation of PASEG(x)…) Although this is not at all advised by the paper, it seems to me that some users of this processing opt instead for simulations of the parameter taken from the ABC posterior, which amounts to using the “data twice“, i.e. the squared likelihood instead of the likelihood…  So, while the procedure is formally correct (despite Templeton’s arguments against it), it has no added value. Obviously, one could alternatively argue that the computational precision in approximating the marginal likelihoods is higher with the (non-parametric) solution based on PAFREG(x) than the (ABC) solution based on x, but this is yet to be demonstrated (and weighted against the information loss).

Just as a side remark on the polychotomous logistic regression approximation to the posterior probabilities introduced in Fagundes et al.: the idea is quite enticing, as a statistical regularisation of ABC simulations. It could be exploited further by using a standard model selection strategy in order to pick the summary statistics that are truly contributed to explain the model index.

## ABC model choice not to be trusted

Posted in Mountains, Statistics, University life, R with tags , , , , , , , , , on January 27, 2011 by xi'an

This may sound like a paradoxical title given my recent production in this area of ABC approximations, especially after the disputes with Alan Templeton, but I have come to the conclusion that ABC approximations to the Bayes factor are not to be trusted. When working one afternoon in Park City with Jean-Michel and Natesh Pillai (drinking tea in front of a fake log-fire!), we looked at the limiting behaviour of the Bayes factor constructed by an ABC algorithm, ie by approximating posterior probabilities for the models from the frequencies of acceptances of simulations from those models (assuming the use of a common summary statistic to define the distance to the observations). Rather obviously (a posteriori!), we ended up with the true Bayes factor based on the distributions of the summary statistics under both models! Continue reading

## Another ABC rebuttal

Posted in Statistics, University life with tags , , , , on October 31, 2010 by xi'an

“Given that some logical overlap is common when dealing with complex models, this means that much of the literature using ABC is invalid.” Alan Templeton, July 2010.

I had not noticed another reply to Templeton’s PNAS diatribe against ABC that was published by Csilléry, Blum, Gaggiotti and François in Trends in Ecology and Evolution. This reply follows a letter written by Templeton to this journal and published last July. Alan Templeton takes issue with the inclusion of a box in the nice survey of Csilléry et al. entitled Controversy surrounding ABC. The letter reproduces earlier arguments I already discussed, in particular the “logical impossibility” to have larger models enjoying smaller posterior probabilities than smaller models [that are special cases]. The conclusion that

“1) ABC can and does produce results that are mathematically impossible; 2) the ‘posterior probabilities’ of ABC cannot possibly be true probability measures; and 3) ABC is statistically incoherent (incoherent methods can violate the constraints of formal logic)” Alan Templeton, July 2010.

is thus bringing no novelty to the debate. It is nonetheless mildly irritating to see that Alan Templeton is still advancing “mathematical errors” as his main argument, despite detailed rebuttals published by mathematicians and mathematical statisticians. As demonstrated by the repeated argument that BIC should replace ABC (!), or the decomposition of $P(A\cup B\cup C)$ in the PNAS reply,  he is out of his depth on mathematical grounds. However, that he manages to publish a paper like the PNAS diatribe without the journal having a mathematician checking the “mathematical flaws” is more of an issue.

Posted in Statistics, University life with tags , , , , , , , , , on September 29, 2010 by xi'an

“Logical overlap is the norm for the complex models analyzed with ABC, so many ABC posterior model probabilities published to date are wrong.” Alan R. Templeton, PNAS, doi:10.1073/pnas.1009012107

Our letter in PNAS about Templeton’s surprising diatribe on Bayesian inference is now appeared in the early edition, along with Templeton’s reply. This reply is unfortunately missing any novelty element compared with the original paper. First, he maintains that the critcism is about ABC (which is, in case you do not know, a computational technique and not a specific statistical methodology!). Second, he insists on the inappropriate Venn diagram analogy by reproducing the basic identity

$P(A\cup B\cup C) = P(A)+P(B)+P(C)-P(A\cap B)-P(B\cap C)-P(C\cap A)+P(A\cap B\cap C)$

(presumably in case we had lost sight of it!) to argue that using instead

$P(A)+P(B)+P(C)$

is incoherent (hence rejecting Bayes factors, Bayesian model averaging and so on). I am not particularly surprised by this immutable stance, but it means that there is little point in debate when starting from such positions… Our main goal in publishing this letter was actually to stress that the earlier tribune had no statistical ground and I think we achieved this goal.

## Incoherent phylogeographic inference [accepted]

Posted in Statistics, University life with tags , , , , , on August 30, 2010 by xi'an

The letter we submitted to PNAS about Templeton’s surprising diatribe on Bayesian inference has now been accepted:

Title: “Incoherent Phylogeographic Inference”
Tracking #: 2010-08762
Authors: Berger et al.

Dear Prof. Robert,
We are pleased to inform you that the PNAS Editorial Board has given final approval of your letter to the Editor for online publication. The author(s) of the published manuscript have been invited to respond to your feedback. If they provide a response, it may appear online concurrently with your letter.

Now we are looking forward (?) Alan Templeton’s answer, even though I suspect this short letter is not going to have any impact on his views!

## Evidence and evolution (2)

Posted in Statistics, Books with tags , , , , , , , , , , , on April 9, 2010 by xi'an

“When dealing with natural things we will, then, never derive any explanations from the purpose which God or nature may have had in view when creating them and we shall entirely banish from our philosophy the search for final causes. For we should not be so arrogant as to suppose that we can share in God’s plans.” René Descartes, Les Principes de la Philosophie, Livre I, 28

I have now read the second chapter of the book Evidence and Evolution: The Logic Behind the Science by Elliott Sober. The very chapter which title is “Intelligent design”… As posted earlier, I was loath to get into this chapter for fear of being dragged into a nonsensical debate. In fact, the chapter is written from a purely philosophical/logical perspective, while I was looking for statistical arguments given the tenor of the first chapter (reviewing the differences between Bayesians, likelihoodists (sic!), and frequentists). There is therefore very little I can contribute to the debate, being no philosopher of science. I find the introduction of the chapter interesting in that it relates the creationism /”intelligent design” thesis to a long philosophical tradition (witness the above quote from Descartes) rather than to the current political debate about “teaching” creationism in US and UK schools. The disputation of older theses like Paley’s watch is however taking most of the chapter which is disappointing in my humble opinion. In a sense, Sober mostly states the obvious when arguing that when gods or other supernatural beings enter the picture, they can explain for any observed fact with the highest likelihood while being unable to predict any fact not yet observed. I would have prefered to see hard scientific facts and the use of statistical evidence, even of the AIC sort! The call to Popper’s testability does not bring further arguments because Sober also defends the thesis that even the theory of “intelligent” design is falsifiable… In Section 2.19 about model selection, the comparison between a single parameter model and a one million parameter model hints at Ockham’s razor, but Sober misses the point about a  major aspect of Bayesian analysis, which is that by the virtue of hyperpriors and hyperparameters, observations about one group of parameters also brings information about another group of parameters when those are related via a hyperprior (as in small area estimation). Given that the author never discusses the use of priors over the model parameters and uses instead pluggin estimates, he does not take advantage of the marginal posterior dependence between the different groups of parameters.

## Evidence and evolution

Posted in Statistics, Books with tags , , , , , , , on April 1, 2010 by xi'an

I have received the book Evidence and Evolution: The Logic Behind the Science by Elliott Sober to review. The book is written by a philosopher of science who has worked on the notion of evidence, in the statistical meaning of the word. I am currently reading the first chapter which is fairly well written and which presents a reasonable picture on the different perspectives (Bayesian, likelihood, frequentist) used for hypothesis testing and model choice. Akaike’s information criterion is a wee too much promoted but that’s the author’s choice after all. However I just came yesterday upon a section where Sober reproduces the error central to Templeton’s thesis and discussed on the Og a few days ago. He indeed states that “the simpler model cannot have the higher prior probability—a point that Popper (1959) emphasized.” And he insists further that there is no reason for thinking that

$P(\theta=0) > P(\theta>0)$

is true (page 84). (The measure-theoretic objections raised earlier obviously apply there as well.) It must thus be more of a common misconception among philosophers of science than I previously thought….

As described on the backcover, the purpose of the book is

“How should the concept of evidence be understood? And how does the concept of evidence apply to the controversy about creationism as well as to work in evolutionary biology about natural selection and common ancestry? In this rich and wide-ranging book, Elliott Sober investigates general questions about probability and evidence and shows how the answers he develops to those questions apply to the specifics of evolutionary biology. Drawing on a set of fascinating examples, he analyzes whether claims about intelligent design are untestable; whether they are discredited by the fact that many adaptations are imperfect; how evidence bears on whether present species trace back to common ancestors; how hypotheses about natural selection can be tested, and many other issues. His book will interest all readers who want to understand philosophical questions about evidence and evolution, as they arise both in Darwin’s work and in contemporary biological research.”

Sober applies these concepts of evidence to some versions of creationism… I am obviously reluctant to go through this second chapter about creationism as there is no use in arguing about the existence of gods in a book about science, but I am still curious to see how Sober analyses this issue.