Archive for hypothesis testing

straightforward statistics [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on July 3, 2014 by xi'an

“I took two different statistics courses as an undergraduate psychology major [and] four different advanced statistics classes as a PhD student.” G. Geher

Straightforward Statistics: Understanding the Tools of Research by Glenn Geher and Sara Hall is an introductory textbook for psychology and other social science students. (That Oxford University Press sent me for review. Nice cover, by the way!) I can spot the purpose behind the title, purpose heavily stressed anew in the preface and the first chapter, but it nonetheless irks me as conveying the message that one semester of reasonable diligence in class will suffice to any college students to “not only understanding research findings from psychology, but also to uncovering new truths about the world and our place in it” (p.9). Nothing less. While, in essence, it covers the basics found in all introductory textbooks, from descriptive statistics to ANOVA models. The inclusion of “real research examples” in the chapters of the book rather demonstrates how far from real research a reader of the book would stand… Continue reading

a refutation of Johnson’s PNAS paper

Posted in Books, Statistics, University life with tags , , , , , , , on February 11, 2014 by xi'an

Jean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)

“We do not discuss the validity of this [Bayesian] hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”

The refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk. Continue reading

workshop a Venezia (2)

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 10, 2012 by xi'an

I could only attend one day of the workshop on likelihood, approximate likelihood and nonparametric statistical techniques with some applications, and I wish I could have stayed a day longer (and definitely not only for the pleasure of being in Venezia!) Yesterday, Bruce Lindsay started the day with an extended review of composite likelihood, followed by recent applications of composite likelihood to clustering (I was completely unaware he had worked on the topic in the 80′s!). His talk was followed by several talks working on composite likelihood and other pseudo-likelihoods, which made me think about potential applications to ABC. During my tutorial talk on ABC, I got interesting questions on multiple testing and how to combine the different “optimal” summary statistics (answer: take all of them, it would not make sense to co;pare one pair with one summary statistic and another pair with another summary statistic), and on why we were using empirical likelihood rather than another pseudo-likelihood (answer: I do not have a definite answer. I guess it depends on the ease with which the pseudo-likelihood is derived and what we do with it. I would e.g. feel less confident to use the pairwise composite as a substitute likelihood rather than as the basis for a score function.) In the final afternoon, Monica Musio presented her joint work with Phil Dawid on score functions and their connection with pseudo-likelihood and estimating equations (another possible opening for ABC), mentioning a score family developped by Hyvärinen that involves the gradient of the square-root of a density, in the best James-Stein tradition! (Plus an approach bypassing the annoying missing normalising constant.) Then, based on a joint work with Nicola Satrori and Laura Ventura, Ruli Erlis exposed a 3rd-order tail approximation towards a (marginal) posterior simulation called HOTA. As Ruli will visit me in Paris in the coming weeks, I hope I can explore the possibilities of this method when he is (t)here. At last, Stéfano Cabras discussed higher-order approximations for Bayesian point-null hypotheses (jointly with Walter Racugno and Laura Ventura), mentioning the Pereira and Stern (so special) loss function mentioned in my post on Måns’ paper the very same day! It was thus a very informative and beneficial day for me, furthermore spent in a room overlooking the Canal Grande in the most superb location!

is the p-value a good measure of evidence?

Posted in Statistics, University life with tags , , , , , , , on November 30, 2011 by xi'an

Statistics abounds criteria for assessing quality of estimators, tests, forecasting rules, classification algorithms, but besides the likelihood principle discussions, it seems to be almost silent on what criteria should a good measure of evidence satisfy.” M. Grendár

A short note (4 pages) appeared on arXiv a few days ago, entitled “is the p-value a good measure of evidence? an asymptotic consistency criterion” by M. Grendár. It is rather puzzling in that it defines the consistency of an evidence measure ε(H1,H2,Xn) (for the hypothesis H1 relative to the alternative H2) by

\lim_{n\rightarrow\infty} P(H_1|\epsilon(\neg H_1,H_2,X^n)\in S) =0

where S is “the category of the most extreme values of the evidence measure (…) that corresponds to the strongest evidence” (p.2) and which is interpreted as “the probability [of the first hypothesis H1], given that the measure of evidence strongly testifies against H1, relative to H2 should go to zero” (p.2). So this definition requires a probability measure on the parameter  spaces or at least on the set of model indices, but it is not explicitly stated in the paper. The proofs that the p-value is inconsistent and that the likelihood ratio is consistent do involve model/hypothesis prior probabilities and weights, p(.) and w. However, the last section on the consistency of the Bayes factor states “it is open to debate whether a measure of evidence can depend on a prior information” (p.3) and it uses another notation, q(.), for the prior distribution…  Furthermore, it reproduces the argument found in Templeton that larger evidence should be attributed to larger hypotheses. And it misses our 1992 analysis of p-values from a decision-theoretic perspective, where we show they are inadmissible for two-sided tests, answering the question asked in the quote above.

Testing and significance

Posted in R, Statistics, University life with tags , , , , , , , on September 13, 2011 by xi'an

Julien Cornebise pointed me to this Guardian article that itself summarises the findings of a Nature Neuroscience article I cannot access. The core of the paper is that a large portion of comparative studies conclude to a significant difference between protocols when one protocol result is significantly different from zero and the other one(s) is(are) not…  From a frequentist perspective (I am not even addressing the Bayesian aspects of using those tests!), under the null hypothesis that both protocols induce the same null effect, the probability of wrongly deriving a significant difference can be evaluated by

> x=rnorm(10^6)
> y=rnorm(10^6)
> sum((abs(x)<1.96)*(abs(y)>1.96)*(abs(x-y)<1.96*sqrt(2)))
[1] 31805
> sum((abs(x)>1.96)*(abs(y)<1.96)*(abs(x-y)<1.96*sqrt(2)))
[1] 31875
> (31805+31875)/10^6
[1] 0.06368

which moves to a 26% probability of error when x is drifted by 2! (The maximum error is just above 30%, when x is drifted by around 2.6…)

(This post was written before Super Andrew posted his own “difference between significant and not significant“! My own of course does not add much to the debate.)

The controversy about hypothesis testing

Posted in Statistics, Travel, University life with tags , , , on July 21, 2011 by xi'an

José Bernardo announced a workshop in Madrid at the end of the year (December 15 and 16) on (the controversy about) hypothesis testing. The details are on this dedicated blog. Papers can be submitted until August 31. I wish I could attend, but being in Phoenix and London the previous days, it simply is impossible…

The foundations of Statistics: a simulation-based approach

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on July 12, 2011 by xi'an

“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be `imperfectly linear’.” page 128

This book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in  areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using too much mathematics”, towards building “the confidence necessary for carrying more sophisticated analyses” through R simulation. This is a praiseworthy goal, bound to produce a great book. However, and most sadly, I find the book does not live up to expectations. As in Radford Neal’s recent coverage of introductory probability books with R, there are statements there that show a deep misunderstanding of the topic… (This post has also been published on the Statistics Forum.) Continue reading

Follow

Get every new post delivered to your Inbox.

Join 698 other followers