## About Fig. 4 of Fagundes et al. (2007)

Posted in R, Statistics, University life with tags , , , , , , , , on July 13, 2011 by xi'an Yesterday, we had a meeting of our EMILE network on statistics for population genetics (in Montpellier) and we were discussing our respective recent advances in ABC model choice. One of our colleagues mentioned the constant request (from referees) to include the post-ABC processing devised by Fagundes et al. in their 2007 ABC paper. (This paper contains a wealth of statistical innovations, but I only focus here on this post-checking device.)

The method centres around the above figure, with the attached caption

Fig. 4. Empirical distributions of the estimated relative probabilities of the AFREG model when the AFREG (solid line), MREBIG (dashed line), and ASEG (dotted line) models are the true models. Here, we simulated 1,000 data sets under the AFREG, MREBIG, and ASEG models by drawing random parameter values from the priors. The density estimates of the three models at the AFREG posterior probability = 0.781 (vertical line) were used to compute the probability that AFREG is the correct model given our observation that PAFREG = 0.781. This probability is equal to 0.817.

which aims at computing a p-value based on the ABC estimate of the posterior probability of a model.

I am somehow uncertain about the added value of this computation and about the paradox of the sentence “the probability that AFREG is the correct model [given] the AFREG posterior probability (..) is equal to 0.817″… If I understand correctly the approach followed by Fagundes et al., they simulate samples from the joint distribution over parameter and (pseudo-)data conditional on each model, then approximate the density of the [ABC estimated] posterior probabilities of the AFREG model by a non parametric density estimate, presumably density(), which means in Bayesian terms the marginal likelihoods (or evidences) of the posterior probability of  the AFREG model under each of the models under comparison. The “probability that AFREG is the correct model given our observation that PAFREG = 0.781″ is then completely correct in the sense that it is truly a posterior probability for this model based on the sole observation of the transform (or statistic) of the data x equal to PAFREG(x). However, if we only look at the Bayesian perspective and do not consider the computational aspects, there is no rationale in moving from the data (or from the summary statistics) to a single statistic equal to PAFREG(x), as this induces a loss of information. (Furthermore, it seems to me that the answer is not invariant against the choice of the model whose posterior probability is computed, if more than two models are compared. In other words, the posterior probability of the AFREG model given the sole observation of PAFREG(x). is not necessarily the same as the posterior probability of the AFREG model given the sole observation of PASEG(x)…) Although this is not at all advised by the paper, it seems to me that some users of this processing opt instead for simulations of the parameter taken from the ABC posterior, which amounts to using the “data twice“, i.e. the squared likelihood instead of the likelihood…  So, while the procedure is formally correct (despite Templeton’s arguments against it), it has no added value. Obviously, one could alternatively argue that the computational precision in approximating the marginal likelihoods is higher with the (non-parametric) solution based on PAFREG(x) than the (ABC) solution based on x, but this is yet to be demonstrated (and weighted against the information loss).

Just as a side remark on the polychotomous logistic regression approximation to the posterior probabilities introduced in Fagundes et al.: the idea is quite enticing, as a statistical regularisation of ABC simulations. It could be exploited further by using a standard model selection strategy in order to pick the summary statistics that are truly contributed to explain the model index.

## ABC model choice not to be trusted

Posted in Mountains, R, Statistics, University life with tags , , , , , , , , , on January 27, 2011 by xi'an This may sound like a paradoxical title given my recent production in this area of ABC approximations, especially after the disputes with Alan Templeton, but I have come to the conclusion that ABC approximations to the Bayes factor are not to be trusted. When working one afternoon in Park City with Jean-Michel and Natesh Pillai (drinking tea in front of a fake log-fire!), we looked at the limiting behaviour of the Bayes factor constructed by an ABC algorithm, ie by approximating posterior probabilities for the models from the frequencies of acceptances of simulations from those models (assuming the use of a common summary statistic to define the distance to the observations). Rather obviously (a posteriori!), we ended up with the true Bayes factor based on the distributions of the summary statistics under both models! Continue reading

## Another ABC rebuttal

Posted in Statistics, University life with tags , , , , on October 31, 2010 by xi'an

“Given that some logical overlap is common when dealing with complex models, this means that much of the literature using ABC is invalid.” Alan Templeton, July 2010.

I had not noticed another reply to Templeton’s PNAS diatribe against ABC that was published by Csilléry, Blum, Gaggiotti and François in Trends in Ecology and Evolution. This reply follows a letter written by Templeton to this journal and published last July. Alan Templeton takes issue with the inclusion of a box in the nice survey of Csilléry et al. entitled Controversy surrounding ABC. The letter reproduces earlier arguments I already discussed, in particular the “logical impossibility” to have larger models enjoying smaller posterior probabilities than smaller models [that are special cases]. The conclusion that

“1) ABC can and does produce results that are mathematically impossible; 2) the ‘posterior probabilities’ of ABC cannot possibly be true probability measures; and 3) ABC is statistically incoherent (incoherent methods can violate the constraints of formal logic)” Alan Templeton, July 2010.

is thus bringing no novelty to the debate. It is nonetheless mildly irritating to see that Alan Templeton is still advancing “mathematical errors” as his main argument, despite detailed rebuttals published by mathematicians and mathematical statisticians. As demonstrated by the repeated argument that BIC should replace ABC (!), or the decomposition of $P(A\cup B\cup C)$ in the PNAS reply,  he is out of his depth on mathematical grounds. However, that he manages to publish a paper like the PNAS diatribe without the journal having a mathematician checking the “mathematical flaws” is more of an issue.

Posted in Statistics, University life with tags , , , , , , , , , on September 29, 2010 by xi'an “Logical overlap is the norm for the complex models analyzed with ABC, so many ABC posterior model probabilities published to date are wrong.” Alan R. Templeton, PNAS, doi:10.1073/pnas.1009012107

Our letter in PNAS about Templeton’s surprising diatribe on Bayesian inference is now appeared in the early edition, along with Templeton’s reply. This reply is unfortunately missing any novelty element compared with the original paper. First, he maintains that the critcism is about ABC (which is, in case you do not know, a computational technique and not a specific statistical methodology!). Second, he insists on the inappropriate Venn diagram analogy by reproducing the basic identity $P(A\cup B\cup C) = P(A)+P(B)+P(C)-P(A\cap B)-P(B\cap C)-P(C\cap A)+P(A\cap B\cap C)$

(presumably in case we had lost sight of it!) to argue that using instead $P(A)+P(B)+P(C)$

is incoherent (hence rejecting Bayes factors, Bayesian model averaging and so on). I am not particularly surprised by this immutable stance, but it means that there is little point in debate when starting from such positions… Our main goal in publishing this letter was actually to stress that the earlier tribune had no statistical ground and I think we achieved this goal.

## Incoherent phylogeographic inference [accepted]

Posted in Statistics, University life with tags , , , , , on August 30, 2010 by xi'an

The letter we submitted to PNAS about Templeton’s surprising diatribe on Bayesian inference has now been accepted:

Title: “Incoherent Phylogeographic Inference”
Tracking #: 2010-08762
Authors: Berger et al.

Dear Prof. Robert,
We are pleased to inform you that the PNAS Editorial Board has given final approval of your letter to the Editor for online publication. The author(s) of the published manuscript have been invited to respond to your feedback. If they provide a response, it may appear online concurrently with your letter.

Now we are looking forward (?) Alan Templeton’s answer, even though I suspect this short letter is not going to have any impact on his views!