## Testing and significance

**J**ulien Cornebise pointed me to this Guardian article that itself summarises the findings of a Nature Neuroscience article I cannot access. The core of the paper is that a large portion of comparative studies conclude to a significant difference between protocols when one protocol result is significantly different from zero and the other one(s) is(are) not… From a frequentist perspective (I am not even addressing the Bayesian aspects of using those tests!), under the null hypothesis that both protocols induce the same null effect, the probability of wrongly deriving a significant difference can be evaluated by

> x=rnorm(10^6) > y=rnorm(10^6) > sum((abs(x)<1.96)*(abs(y)>1.96)*(abs(x-y)<1.96*sqrt(2))) [1] 31805 > sum((abs(x)>1.96)*(abs(y)<1.96)*(abs(x-y)<1.96*sqrt(2))) [1] 31875 > (31805+31875)/10^6 [1] 0.06368

which moves to a 26% probability of error when x is drifted by 2! (The maximum error is just above 30%, when x is drifted by around 2.6…)

*(This post was written before Super Andrew posted his own “difference between significant and not significant“! My own of course does not add much to the debate.)*

September 20, 2011 at 12:11 am

[…] Cornebise has [once again!] pointed out a recent Guardian article. It is about commercial publishers of academic journals, […]

September 13, 2011 at 2:18 pm

The paper is available at the first author’s homepage: http://www.sandernieuwenhuis.nl/pdfs/NieuwenhuisEtAl_NN_Perspective.pdf

September 13, 2011 at 10:02 am

But if we shift x by 2, shouldn’t we change the way we calculate probability? If we shift x, we say that the null effect is then not zero, and in doing 10^6 experiments we should detect that.

September 13, 2011 at 10:10 am

If you follow the principles of classical testing (I do not!), the difference x-y is not significantly different from zero in 26% of the cases when x is significantly different from zero and y is not, or the reverse… This may sound paradoxical to you, however this is what the classical theory says. The paradox is easily explained by the fact that, in this artificial experiment, I know (because I wrote the R code) that x has a mean different from zero and that y has a mean equal to zero. So testing for difference sounds exactly identical to testing for x having a mean different from zero. In a practical problem, one does not know whether any of both means is different from zero, so the test should bear on the difference of the means of x and y.

September 13, 2011 at 4:14 am

I’ve just started reading Ziliak & McCloskey’s provocative book, The Cult of Statistical Signficance, which mounts an attack on what they term “sizeless science.” At most, they contend, “significance” answers what they call the “philosophical question” of whether an effect exists and then stops before the real work of asking “how much?” Makes you wonder about all the electrons sacrificed every day at the alter of alpha.

September 13, 2011 at 6:48 am

Interesting! I see that David Aldous also wrote a review of the book on amazon.