## Approximate Bayesian model choice

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , , , on March 17, 2014 by xi'an

The above is the running head of the arXived paper with full title “Implications of  uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al.” by Oaks, Linkem and Sukuraman. That I (again) read in the plane to Montréal (third one in this series!, and last because I also watched the Japanese psycho-thriller Midsummer’s Equation featuring a physicist turned detective in one of many TV episodes. I just found some common features with The Devotion of Suspect X, only to discover now that the book has been turned into another episode in the series.)

“Here we demonstrate that the approach of Hickerson et al. (2014) is dangerous in the sense that the empirically-derived priors often exclude from consideration the true values of the models’ parameters. On a more fundamental level, we question the value of adopting an empirical Bayesian stance for this model-choice problem, because it can mislead model posterior probabilities, which are inherently measures of belief in the models after prior knowledge is updated by the data.”

This paper actually is a reply to Hickerson et al. (2014, Evolution), which is itself a reply to an earlier paper by Oaks et al. (2013, Evolution). [Warning: I did not check those earlier references!] The authors object to the use of “narrow, empirically informed uniform priors” for the reason reproduced in the above quote. In connection with the msBayes of Huang et al. (2011, BMC Bioinformatics). The discussion is less about ABC used for model choice and posterior probabilities of models and more about the impact of vague priors, Oaks et al. (2013) arguing that this leads to a bias towards models with less parameters, a “statistical issue” in their words, while Hickerson et al. (2014) think this is due to msBayes way of selecting models and their parameters at random.

“…it is difficult to choose a uniformly distributed prior on divergence times that is broad enough to confidently contain the true values of parameters while being narrow enough to avoid spurious support of models with less parameter space.”

So quite an interesting debate that takes us in fine far away from the usual worries about ABC model choice! We are more at the level empirical versus natural Bayes, seen in the literature of the 80′s. (The meaning of empirical Bayes is not that clear in the early pages as the authors seem to involve any method using the data “twice”.) I actually do not remember reading papers about the formal properties of model choice done through classical empirical Bayes techniques. Except the special case of Aitkin’s (1991,2009) integrated likelihood. Which is essentially the analysis performed on the coin toy example (p.7)

“…models with more divergence parameters will be forced to integrate over much greater parameter space, all with equal prior density, and much of it with low likelihood.”

The above argument is an interesting rephrasing of Lindley’s paradox, which I cannot dispute, but of course it does not solve the fundamental issue of how to choose the prior away from vague uniform priors… I also like the quote “the estimated posterior probability of a model is a single value (rather than a distribution) lacking a measure of posterior uncertainty” as this is an issue on which we are currently working. I fully agree with the statement and we think an alternative assessment to posterior probabilities could be more appropriate for model selection in ABC settings (paper soon to come, hopefully!).

## on alternative perspectives and solutions on Bayesian tests

Posted in Statistics, Travel, University life with tags , , , , , , , on December 16, 2013 by xi'an

Here are the slides of my tutorial at O’ Bayes 2013 today, a pot-pourri of various, recent and less recent, criticisms (with, albeit less than usual, a certain proportion of recycled slides):

## “an outstanding paper that covers the Jeffreys-Lindley paradox”…

Posted in Statistics, University life with tags , , , , , , , , on December 4, 2013 by xi'an

“This is, in this revised version, an outstanding paper that covers the Jeffreys-Lindley paradox (JLP) in exceptional depth and that unravels the philosophical differences between different schools of inference with the help of the JLP. From the analysis of this paradox, the author convincingly elaborates the principles of Bayesian and severity-based inferences, and engages in a thorough review of the latter’s account of the JLP in Spanos (2013).” Anonymous

I have now received a second round of reviews of my paper, “On the Jeffreys-Lindleys paradox” (submitted to Philosophy of Science) and the reports are quite positive (or even extremely positive as in the above quote!). The requests for changes are directed to clarify points, improve the background coverage, and simplify my heavy style (e.g., cutting Proustian sentences). These requests were easily addressed (hopefully to the satisfaction of the reviewers) and, thanks to the week in Warwick, I have already sent the paper back to the journal, with high hopes for acceptance. The new version has also been arXived. I must add that some parts of the reviews sounded much better than my original prose and I was almost tempted to include them in the final version. Take for instance

“As a result, the reader obtains not only a better insight into what is at stake in the JLP, going beyond the results of Spanos (2013) and Sprenger (2013), but also a much better understanding of the epistemic function and mechanics of statistical tests. This is a major achievement given the philosophical controversies that have haunted the topic for decades. Recent insights from Bayesian statistics are integrated into the article and make sure that it is mathematically up to date, but the technical and foundational aspects of the paper are well-balanced.” Anonymous

## on the Jeffreys-Lindley’s paradox (revision)

Posted in Statistics, University life with tags , , , , , , , , , on September 17, 2013 by xi'an

As mentioned here a few days ago, I have been revising my paper on the Jeffreys-Lindley’s paradox paper for Philosophy of Science. It came as a bit of a (very pleasant) surprise that this journal was ready to consider a revised version of the paper given that I have no formal training in philosophy and that the (first version of the) paper was rather hurriedly made of a short text written for the 95th birthday of Dennis Lindley and of my blog post on Aris Spanos’ “Who should be afraid of the Jeffreys-Lindley paradox?“, recently published in Philosophy of Science.  So I found both reviewers very supportive and I am grateful for their suggestions to improve both the scope and the presentation of the paper. It has been resubmitted and rearXived, and I am now waiting for the decision of the editorial team with the appropriate philosophical sense of detachment…

Posted in Books, Statistics, University life with tags , , , , , on September 13, 2013 by xi'an

“In the asymptotic limit, the Bayesian cannot justify the strictly positive probability of H0 as an approximation to testing the hypothesis that the parameter value is close to θ0 — which is the hypothesis of real scientific interest”

While revising my Jeffreys-Lindley’s paradox paper for Philosophy of Science, it was suggested (to me) that I read the incoming paper by Jan Sprenger on this paradox. The paper is entitled Testing a Precise Null Hypothesis: The Case of Lindley’s Paradox and it defends the thesis that the regular Bayesian approach (hence the Bayes factor used in the Jeffreys-Lindley’s paradox) is forced to put a prior on the (point) null hypothesis when all that really matters is the vicinity of the null. (I think Andrew would agree there as he positively hates point null hypotheses. See also Rissanen’s perspective about maximal precision allowed by a give sample.) Sprenger then advocates the use of the log score for comparing the full model with the point-null sub-model, i.e. the posterior expectation of the Kullback-Leibler distance between both models:

$\mathbb{E}^\pi\left[\mathbb{E}_\theta\{\log f(X|\theta)/ f(X|\theta_0)\}|x\right],$

rejoining  José Bernardo and Phil Dawid on this ground.

While I agree about the notion that it is impossible to distinguish a small enough departure from the null from the null (no typo!), and I also support the argument that “all models are wrong”, hence point null should eventually—meaning with enough data—rejected, I find the Bayesian solution through the Bayes factor rather appealing because it uses the prior distribution to weight the alternative values of θ in order to oppose their averaged likelihood to the likelihood in θ0. (Note I did not mention Occam!) Further, while the notion of opposing a point null to the rest of the Universe may sound silly, what truly matters is the decisional setting, namely that we want to select a simpler model and use it for later purposes. It is therefore this issue that should be tested, rather than whether or not θ is exactly equal to θ0. I incidentally find it amusing that Sprenger picks the ESP experiment as his illustration in that this is a (the?) clearcut case where the point null: “there is no such thing as ESP” makes sense. Now, it can be argued that what the statistical experiment is assessing is the ESP experiment, for which many objective causes (beyond ESP!) may induce a departure from the null (and from the binomial model). But then this prevents any rational analysis of the test (as is indeed the case!).

The paper thus objects to the use of Bayes factors (and of p-values) to instead  propose to compare scores in the Bernardo-Dawid spirit. As discussed earlier, it has several appealing features, from recovering the Kullback-Leibler divergence between models as a measure of fit  to allowing for the incorporation of improper priors (a point Andrew may disagree with), to avoiding the double use of the data. It is however incomplete in that it creates a discrepancy or a disbalance between both models, thus making the comparison of more than two models difficult to fathom, and it does not readily incorporate the notion of nuisance parameters in the embedded model, seemingly forcing the inclusion of pseudo-priors as in the Bayesian analysis of Aitkin’s integrated likelihood.