I could only attend one day of the workshop on likelihood, approximate likelihood and nonparametric statistical techniques with some applications, and I wish I could have stayed a day longer (and definitely not only for the pleasure of being in Venezia!) Yesterday, Bruce Lindsay started the day with an extended review of composite likelihood, followed by recent applications of composite likelihood to clustering (I was completely unaware he had worked on the topic in the 80′s!). His talk was followed by several talks working on composite likelihood and other pseudo-likelihoods, which made me think about potential applications to ABC. During my tutorial talk on ABC, I got interesting questions on multiple testing and how to combine the different “optimal” summary statistics (answer: take all of them, it would not make sense to co;pare one pair with one summary statistic and another pair with another summary statistic), and on why we were using empirical likelihood rather than another pseudo-likelihood (answer: I do not have a definite answer. I guess it depends on the ease with which the pseudo-likelihood is derived and what we do with it. I would e.g. feel less confident to use the pairwise composite as a substitute likelihood rather than as the basis for a score function.) In the final afternoon, Monica Musio presented her joint work with Phil Dawid on score functions and their connection with pseudo-likelihood and estimating equations (another possible opening for ABC), mentioning a score family developped by Hyvärinen that involves the gradient of the square-root of a density, in the best James-Stein tradition! (Plus an approach bypassing the annoying missing normalising constant.) Then, based on a joint work with Nicola Satrori and Laura Ventura, Ruli Erlis exposed a 3rd-order tail approximation towards a (marginal) posterior simulation called HOTA. As Ruli will visit me in Paris in the coming weeks, I hope I can explore the possibilities of this method when he is (t)here. At last, Stéfano Cabras discussed higher-order approximations for Bayesian point-null hypotheses (jointly with Walter Racugno and Laura Ventura), mentioning the Pereira and Stern (so special) loss function mentioned in my post on Måns’ paper the very same day! It was thus a very informative and beneficial day for me, furthermore spent in a room overlooking the Canal Grande in the most superb location!
Archive for hypothesis testing
“Statistics abounds criteria for assessing quality of estimators, tests, forecasting rules, classification algorithms, but besides the likelihood principle discussions, it seems to be almost silent on what criteria should a good measure of evidence satisfy.” M. Grendár
A short note (4 pages) appeared on arXiv a few days ago, entitled “is the p-value a good measure of evidence? an asymptotic consistency criterion” by M. Grendár. It is rather puzzling in that it defines the consistency of an evidence measure ε(H1,H2,Xn) (for the hypothesis H1 relative to the alternative H2) by
where S is “the category of the most extreme values of the evidence measure (…) that corresponds to the strongest evidence” (p.2) and which is interpreted as “the probability [of the first hypothesis H1], given that the measure of evidence strongly testifies against H1, relative to H2 should go to zero” (p.2). So this definition requires a probability measure on the parameter spaces or at least on the set of model indices, but it is not explicitly stated in the paper. The proofs that the p-value is inconsistent and that the likelihood ratio is consistent do involve model/hypothesis prior probabilities and weights, p(.) and w. However, the last section on the consistency of the Bayes factor states “it is open to debate whether a measure of evidence can depend on a prior information” (p.3) and it uses another notation, q(.), for the prior distribution… Furthermore, it reproduces the argument found in Templeton that larger evidence should be attributed to larger hypotheses. And it misses our 1992 analysis of p-values from a decision-theoretic perspective, where we show they are inadmissible for two-sided tests, answering the question asked in the quote above.
Julien Cornebise pointed me to this Guardian article that itself summarises the findings of a Nature Neuroscience article I cannot access. The core of the paper is that a large portion of comparative studies conclude to a significant difference between protocols when one protocol result is significantly different from zero and the other one(s) is(are) not… From a frequentist perspective (I am not even addressing the Bayesian aspects of using those tests!), under the null hypothesis that both protocols induce the same null effect, the probability of wrongly deriving a significant difference can be evaluated by
> x=rnorm(10^6) > y=rnorm(10^6) > sum((abs(x)<1.96)*(abs(y)>1.96)*(abs(x-y)<1.96*sqrt(2)))  31805 > sum((abs(x)>1.96)*(abs(y)<1.96)*(abs(x-y)<1.96*sqrt(2)))  31875 > (31805+31875)/10^6  0.06368
which moves to a 26% probability of error when x is drifted by 2! (The maximum error is just above 30%, when x is drifted by around 2.6…)
(This post was written before Super Andrew posted his own “difference between significant and not significant“! My own of course does not add much to the debate.)
José Bernardo announced a workshop in Madrid at the end of the year (December 15 and 16) on (the controversy about) hypothesis testing. The details are on this dedicated blog. Papers can be submitted until August 31. I wish I could attend, but being in Phoenix and London the previous days, it simply is impossible…
“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be `imperfectly linear’.” page 128
This book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using too much mathematics”, towards building “the confidence necessary for carrying more sophisticated analyses” through R simulation. This is a praiseworthy goal, bound to produce a great book. However, and most sadly, I find the book does not live up to expectations. As in Radford Neal’s recent coverage of introductory probability books with R, there are statements there that show a deep misunderstanding of the topic… (This post has also been published on the Statistics Forum.) Read more »