**N**ext Fall, on 15-16 September, I will take part in a CRiSM workshop on hypothesis testing. In our department in Warwick. The registration is now open [until Sept 2] with a moderate registration free of £40 and a call for posters. Jim Berger and Joris Mulder will both deliver a plenary talk there, while Andrew Gelman will alas give a remote talk from New York. (A terrific poster by the way!)

## Archive for hypothesis testing

## contemporary issues in hypothesis testing

Posted in pictures, Statistics, Travel, University life with tags Andrew Gelman, Bayes factors, Bayesian foundations, Bayesian statistics, Coventry, CRiSM, England, Fall, hypothesis testing, Jim Berger, Joris Mulder, statistical tests, University of Warwick, workshop on May 3, 2016 by xi'an## triste célébration for World Statistics Day

Posted in Books, Kids, Statistics, University life with tags exercises, hypothesis testing, medical school, p-values, university on October 21, 2015 by xi'an**A**s I was discussing yesterday night with my daughter about a practice stats exam she had just taken in medical school, I came upon the following question:

What is the probability that women have the same risk of cancer as men in the entire population given that the selected sample concluded against equality?

Which just means nothing, since conditioning on the observed event, say |X|>1.96, cancels any probabilistic structure in the problem. Worse, I have no idea what is the expected answer to this question!

## straightforward statistics [book review]

Posted in Books, Kids, Statistics, University life with tags hypothesis testing, introductory textbooks, multiple tests, Oxford University Press, p-values, power, psychology, tests on July 3, 2014 by xi'an

“I took two different statistics courses as an undergraduate psychology major [and] four different advanced statistics classes as a PhD student.”G. Geher

*Straightforward Statistics: Understanding the Tools of Research* by Glenn Geher and Sara Hall is an introductory textbook for psychology and other social science students. (That Oxford University Press sent me for review in CHANCE. Nice cover, by the way!) I can spot the purpose behind the title, purpose heavily stressed anew in the preface and the first chapter, but it nonetheless irks me as conveying the message that one semester of reasonable diligence in class will suffice to any college students to *“not only understanding research findings from psychology, but also to uncovering new truths about the world and our place in it”* (p.9). Nothing less. While, in essence, it covers the basics found in all introductory textbooks, from descriptive statistics to ANOVA models. The inclusion of “real research examples” in the chapters of the book rather demonstrates how far from real research a reader of the book would stand… Continue reading

## a refutation of Johnson’s PNAS paper

Posted in Books, Statistics, University life with tags Alice and Bob, Bayes factor, Bayesian statistics, Bayesian tests, hypothesis testing, p-value, Valen Johnson, xkcd on February 11, 2014 by xi'an**J**ean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)

“We do not discuss the validity of this[Bayesian]hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”

**T**he refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk. Continue reading

## workshop a Venezia (2)

Posted in pictures, Statistics, Travel, University life with tags ABC, approximate likelihood, Ca' Foscari University, clustering, composite likelihood, empirical likelihood, hypothesis testing, Italia, loss functions, normalising constant, point null hypotheses, Venezia on October 10, 2012 by xi'an**I** could only attend one day of the workshop on likelihood, approximate likelihood and nonparametric statistical techniques with some applications, and I wish I could have stayed a day longer (and definitely not only for the pleasure of being in Venezia!) Yesterday, Bruce Lindsay started the day with an extended review of composite likelihood, followed by recent applications of composite likelihood to clustering (I was completely unaware he had worked on the topic in the 80’s!). His talk was followed by several talks working on composite likelihood and other pseudo-likelihoods, which made me think about potential applications to ABC. During my tutorial talk on ABC, I got interesting questions on multiple testing and how to combine the different “optimal” summary statistics (*answer:* take all of them, it would not make sense to co;pare one pair with one summary statistic and another pair with another summary statistic), and on why we were using empirical likelihood rather than another pseudo-likelihood (*answer:* I do not have a definite answer. I guess it depends on the ease with which the pseudo-likelihood is derived and what we do with it. I would e.g. feel less confident to use the pairwise composite as a substitute likelihood rather than as the basis for a score function.) In the final afternoon, Monica Musio presented her joint work with Phil Dawid on score functions and their connection with pseudo-likelihood and estimating equations (another possible opening for ABC), mentioning a score family developped by Hyvärinen that involves the gradient of the square-root of a density, in the best James-Stein tradition! (Plus an approach bypassing the annoying missing normalising constant.) Then, based on a joint work with Nicola Satrori and Laura Ventura, Ruli Erlis exposed a 3rd-order tail approximation towards a (marginal) posterior simulation called HOTA. As Ruli will visit me in Paris in the coming weeks, I hope I can explore the possibilities of this method when he is (t)here. At last, Stéfano Cabras discussed higher-order approximations for Bayesian point-null hypotheses (jointly with Walter Racugno and Laura Ventura), mentioning the Pereira and Stern (so special) loss function mentioned in my post on Måns’ paper the very same day! It was thus a very informative and beneficial day for me, furthermore spent in a room overlooking the Canal Grande in the most superb location!

## is the p-value a good measure of evidence?

Posted in Statistics, University life with tags Bayes factors, Bayesian decision theory, Bayesian inference, Bayesian model choice, consistency, evidence, hypothesis testing, p-values on November 30, 2011 by xi'an“

Statistics abounds criteria for assessing quality of estimators, tests, forecasting rules, classification algorithms, but besides the likelihood principle discussions, it seems to be almost silent on what criteria should a good measure of evidence satisfy.” M. Grendár

**A** short note (4 pages) appeared on arXiv a few days ago, entitled “is the p-value a good measure of evidence? an asymptotic consistency criterion” by M. Grendár. It is rather puzzling in that it defines the *consistency* of an evidence measure *ε(H _{1},H_{2},X^{n})* (for the hypothesis

*H*relative to the alternative

_{1}*H*) by

_{2}where S is “the category of the most extreme values of the evidence measure (…) that corresponds to the strongest evidence” (p.2) and which is interpreted as “the probability [of the first hypothesis *H _{1}*], given that the measure of evidence strongly testifies against

*H*, relative to

_{1}*H*should go to zero” (p.2). So this definition requires a probability measure on the parameter spaces or at least on the set of model indices, but it is not explicitly stated in the paper. The proofs that the p-value is inconsistent and that the likelihood ratio is consistent do involve model/hypothesis prior probabilities and weights,

_{2}*p(.)*and

*w*. However, the last section on the consistency of the Bayes factor states “it is open to debate whether a measure of evidence can depend on a prior information” (p.3) and it uses another notation,

*q(.)*, for the prior distribution… Furthermore, it reproduces the argument found in Templeton that larger evidence should be attributed to larger hypotheses. And it misses our 1992 analysis of

*p*-values from a decision-theoretic perspective, where we show they are inadmissible for two-sided tests, answering the question asked in the quote above.

## Testing and significance

Posted in R, Statistics, University life with tags Bayesian model evaluation, hypothesis testing, misuse of Statistics, Nature, R, Significance, The Bayesian Choice, The Guardian on September 13, 2011 by xi'an**J**ulien Cornebise pointed me to this Guardian article that itself summarises the findings of a Nature Neuroscience article I cannot access. The core of the paper is that a large portion of comparative studies conclude to a significant difference between protocols when one protocol result is significantly different from zero and the other one(s) is(are) not… From a frequentist perspective (I am not even addressing the Bayesian aspects of using those tests!), under the null hypothesis that both protocols induce the same null effect, the probability of wrongly deriving a significant difference can be evaluated by

> x=rnorm(10^6) > y=rnorm(10^6) > sum((abs(x)<1.96)*(abs(y)>1.96)*(abs(x-y)<1.96*sqrt(2))) [1] 31805 > sum((abs(x)>1.96)*(abs(y)<1.96)*(abs(x-y)<1.96*sqrt(2))) [1] 31875 > (31805+31875)/10^6 [1] 0.06368

which moves to a 26% probability of error when x is drifted by 2! (The maximum error is just above 30%, when x is drifted by around 2.6…)

*(This post was written before Super Andrew posted his own “difference between significant and not significant“! My own of course does not add much to the debate.)*