Testing and significance

Julien Cornebise pointed me to this Guardian article that itself summarises the findings of a Nature Neuroscience article I cannot access. The core of the paper is that a large portion of comparative studies conclude to a significant difference between protocols when one protocol result is significantly different from zero and the other one(s) is(are) not…  From a frequentist perspective (I am not even addressing the Bayesian aspects of using those tests!), under the null hypothesis that both protocols induce the same null effect, the probability of wrongly deriving a significant difference can be evaluated by

> x=rnorm(10^6)
> y=rnorm(10^6)
> sum((abs(x)<1.96)*(abs(y)>1.96)*(abs(x-y)<1.96*sqrt(2)))
[1] 31805
> sum((abs(x)>1.96)*(abs(y)<1.96)*(abs(x-y)<1.96*sqrt(2)))
[1] 31875
> (31805+31875)/10^6
[1] 0.06368

which moves to a 26% probability of error when x is drifted by 2! (The maximum error is just above 30%, when x is drifted by around 2.6…)

(This post was written before Super Andrew posted his own “difference between significant and not significant“! My own of course does not add much to the debate.)

6 Responses to “Testing and significance”

  1. […] Cornebise has [once again!] pointed out a recent Guardian article. It is about commercial publishers of academic journals, […]

  2. Paul Metzner Says:

    The paper is available at the first author’s homepage: http://www.sandernieuwenhuis.nl/pdfs/NieuwenhuisEtAl_NN_Perspective.pdf

  3. But if we shift x by 2, shouldn’t we change the way we calculate probability? If we shift x, we say that the null effect is then not zero, and in doing 10^6 experiments we should detect that.

    • If you follow the principles of classical testing (I do not!), the difference x-y is not significantly different from zero in 26% of the cases when x is significantly different from zero and y is not, or the reverse… This may sound paradoxical to you, however this is what the classical theory says. The paradox is easily explained by the fact that, in this artificial experiment, I know (because I wrote the R code) that x has a mean different from zero and that y has a mean equal to zero. So testing for difference sounds exactly identical to testing for x having a mean different from zero. In a practical problem, one does not know whether any of both means is different from zero, so the test should bear on the difference of the means of x and y.

  4. I’ve just started reading Ziliak & McCloskey’s provocative book, The Cult of Statistical Signficance, which mounts an attack on what they term “sizeless science.” At most, they contend, “significance” answers what they call the “philosophical question” of whether an effect exists and then stops before the real work of asking “how much?” Makes you wonder about all the electrons sacrificed every day at the alter of alpha.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: