I could only attend one day of the workshop on likelihood, approximate likelihood and nonparametric statistical techniques with some applications, and I wish I could have stayed a day longer (and definitely not only for the pleasure of being in Venezia!) Yesterday, Bruce Lindsay started the day with an extended review of composite likelihood, followed by recent applications of composite likelihood to clustering (I was completely unaware he had worked on the topic in the 80′s!). His talk was followed by several talks working on composite likelihood and other pseudo-likelihoods, which made me think about potential applications to ABC. During my tutorial talk on ABC, I got interesting questions on multiple testing and how to combine the different “optimal” summary statistics (answer: take all of them, it would not make sense to co;pare one pair with one summary statistic and another pair with another summary statistic), and on why we were using empirical likelihood rather than another pseudo-likelihood (answer: I do not have a definite answer. I guess it depends on the ease with which the pseudo-likelihood is derived and what we do with it. I would e.g. feel less confident to use the pairwise composite as a substitute likelihood rather than as the basis for a score function.) In the final afternoon, Monica Musio presented her joint work with Phil Dawid on score functions and their connection with pseudo-likelihood and estimating equations (another possible opening for ABC), mentioning a score family developped by Hyvärinen that involves the gradient of the square-root of a density, in the best James-Stein tradition! (Plus an approach bypassing the annoying missing normalising constant.) Then, based on a joint work with Nicola Satrori and Laura Ventura, Ruli Erlis exposed a 3rd-order tail approximation towards a (marginal) posterior simulation called HOTA. As Ruli will visit me in Paris in the coming weeks, I hope I can explore the possibilities of this method when he is (t)here. At last, Stéfano Cabras discussed higher-order approximations for Bayesian point-null hypotheses (jointly with Walter Racugno and Laura Ventura), mentioning the Pereira and Stern (so special) loss function mentioned in my post on Måns’ paper the very same day! It was thus a very informative and beneficial day for me, furthermore spent in a room overlooking the Canal Grande in the most superb location!
Archive for point null hypotheses
Måns Thulin released today an arXiv document on some decision-theoretic justifications for [running] Bayesian hypothesis testing through credible sets. His main point is that using the unnatural prior setting mass on a point-null hypothesis can be avoided by rejecting the null when the point-null value of the parameter does not belong to the credible interval and that this decision procedure can be validated through the use of special loss functions. While I stress to my students that point-null hypotheses are very unnatural and should be avoided at all cost, and also that constructing a confidence interval is not the same as designing a test—the former assess the precision in the estimation, while the later opposes two different and even incompatible models—, let us consider Måns’ arguments for their own sake.
The idea of the paper is that there exist loss functions for testing point-null hypotheses that lead to HPD, symmetric and one-sided intervals as acceptance regions, depending on the loss func. This was already found in Pereira & Stern (1999). The issue with these loss functions is that they involve the corresponding credible sets in their definition, hence are somehow tautological. For instance, when considering the HPD set and T(x) as the largest HPD set not containing the point-null value of the parameter, the corresponding loss function is
parameterised by a,b,c. And depending on the HPD region.
Måns then introduces new loss functions that do not depend on x and still lead to either the symmetric or the one-sided credible intervals.as acceptance regions. However, one test actually has two different alternatives (Theorem 2), which makes it essentially a composition of two one-sided tests, while the other test returns the result to a one-sided test (Theorem 3), so even at this face-value level, I do not find the result that convincing. (For the one-sided test, George Casella and Roger Berger (1986) established links between Bayesian posterior probabilities and frequentist p-values.) Both Theorem 3 and the last result of the paper (Theorem 4) use a generic and set-free observation-free loss function (related to eqn. (5.2.1) in my book!, as quoted by the paper) but (and this is a big but) they only hold for prior distributions setting (prior) mass on both the null and the alternative. Otherwise, the solution is to always reject the hypothesis with the zero probability… This is actually an interesting argument on the why-are-credible-sets-unsuitable-for-testing debate, as it cannot bypass the introduction of a prior mass on Θ0!
Overall, I furthermore consider that a decision-theoretic approach to testing should encompass future steps rather than focussing on the reply to the (admittedly dumb) question is θ zero? Therefore, it must have both plan A and plan B at the ready, which means preparing (and using!) prior distributions under both hypotheses. Even on point-null hypotheses.
Now, after I wrote the above, I came upon a Stack Exchange page initiated by Måns last July. This is presumably not the first time a paper stems from Stack Exchange, but this is a fairly interesting outcome: thanks to the debate on his question, Måns managed to get a coherent manuscript written. Great! (In a sense, this reminded me of the polymath experiments of Terry Tao, Timothy Gower and others. Meaning that maybe most contributors could have become coauthors to the paper!)
Here is [yet!] another Bayesian textbook that appeared recently. I read it in the past few days and, despite my obvious biases and prejudices, I liked it very much! It has a lot in common (at least in spirit) with our Bayesian Core, which may explain why I feel so benevolent towards Bayesian ideas and data analysis. Just like ours, the book by Ron Christensen, Wes Johnson, Adam Branscum, and Timothy Hanson is indeed focused on explaining the Bayesian ideas through (real) examples and it covers a lot of regression models, all the way to non-parametrics. It contains a good proportion of WinBugs and R codes. It intermingles methodology and computational chapters in the first part, before moving to the serious business of analysing more and more complex regression models. Exercises appear throughout the text rather than at the end of the chapters. As the volume of their book is more important (over 500 pages), the authors spend more time on analysing various datasets for each chapter and, more importantly, provide a rather unique entry on prior assessment and construction. Especially in the regression chapters. The author index is rather original in that it links the authors with more than one entry to the topics they are connected with (Ron Christensen winning the game with the highest number of entries). Continue reading
“In statistics, coherent measures of fit of nested and overlapping composite hypotheses are technically those measures that are consistent with the constraints of formal logic. For example, the probability of the nested special case must be less than or equal to the probability of the general model within which the special case is nested. Any statistic that assigns greater probability to the special case is said to be incoherent. An example of incoherence is shown in human evolution, for which the approximate Bayesian computation (ABC) method assigned a probability to a model of human evolution that was a thousand-fold larger than a more general model within which the first model was fully nested. Possible causes of this incoherence are identified, and corrections and restrictions are suggested to make ABC and similar methods coherent.” Alan R. Templeton, PNAS, doi:10.1073/pnas.0910647107
Following the astounding publication of Templeton’s pamphlet against Bayesian inference in PNAS last March, Jim Berger, Steve Fienberg, Adrian Raftery and myself polished a reply focussing on the foundations of statistical testing in Benidorm and submitted a letter to the journal. Here are the (500 word) contents.
Templeton (2010, PNAS) makes a broad attack on the foundations of Bayesian statistical methods—rather than on the purely numerical technique called Approximate Bayesian Computation (ABC)—using incorrect arguments and selective references taken out of context. The most significant example is the argument ``The probability of the nested special case must be less than or equal to the probability of the general model within which the special case is nested. Any statistic that assigns greater probability to the special case is incoherent. An example of incoherence is shown for the ABC (sic!) method.” This opposes both the basis and the practice of Bayesian testing.
The confusion seems to arise from misunderstanding the difference between scientific hypotheses and their mathematical representation. Consider vaccine testing, where in what follows we use VE to represent the vaccine efficacy measured on a scale from to 100. Exploratory vaccines may be efficacious or not. Thus a real biological model corresponds to the hypothesis “VE=0″, that the vaccine is not efficacious. The alternative biological possibility, that the vaccine has an effect, is often stated mathematically as the alternative model “any allowed value of VE is possible,” making it appear that it contains “VE=0.” But Bayesian analysis assigns each model prior distributions arising from the background science; a point mass (e.g. probability 1/2) is assigned to “VE=0″ and the remaining probability mass (e.g. 1/2) is distributed continuously over values of VE in the alternative model. Elementary use of Bayes’ theorem (see, e.g., Berger, 1985, Statistical Decision Theory and Bayesian Analysis) then shows that the simpler model can indeed have a much higher posterior probability. Mathematically, this is explained by the probability distributions residing in different dimensional spaces, and is elementary probability theory for which use of Templeton’s “Venn diagram argument” is simply incorrect.
Templeton also argues that Bayes factors are mathematically incorrect, and he backs his claims with Lavine and Schervish’s (1999, American Statistician) notion of coherence. These authors do indeed criticize the use of Bayes factors as stand-alone criteria but point out that, when combined with prior probabilities of models (as illustrated in the vaccine example above), the result is fully coherent posterior probabilities. Further, Templeton directly attacks the ABC algorithm. ABC is simply a numerical computational technique; attacking it as incoherent is similar to calling calculus incoherent if it is used to compute the wrong thing.
Finally, we note that Templeton has already published essentially identical if more guarded arguments in the ecology literature; we refer readers to a related rebuttal to Templeton’s (2008, Molecular Ecology) critique of the Bayesian approach by Beaumont et al. (2010, Molecular Ecology) that is broader in scope, since it also covers the phylogenetic aspects of nested clade versus a model-based approach.