Julien Cornebise pointed me to this Guardian article that itself summarises the findings of a Nature Neuroscience article I cannot access. The core of the paper is that a large portion of comparative studies conclude to a significant difference between protocols when one protocol result is significantly different from zero and the other one(s) is(are) not… From a frequentist perspective (I am not even addressing the Bayesian aspects of using those tests!), under the null hypothesis that both protocols induce the same null effect, the probability of wrongly deriving a significant difference can be evaluated by
> x=rnorm(10^6) > y=rnorm(10^6) > sum((abs(x)<1.96)*(abs(y)>1.96)*(abs(x-y)<1.96*sqrt(2))) [1] 31805 > sum((abs(x)>1.96)*(abs(y)<1.96)*(abs(x-y)<1.96*sqrt(2))) [1] 31875 > (31805+31875)/10^6 [1] 0.06368
which moves to a 26% probability of error when x is drifted by 2! (The maximum error is just above 30%, when x is drifted by around 2.6…)
(This post was written before Super Andrew posted his own “difference between significant and not significant“! My own of course does not add much to the debate.)
