a refutation of Johnson’s PNAS paper
Jean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)
“We do not discuss the validity of this [Bayesian] hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”
The refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk.
“These examples vividly illustrate that the choice of the a priori Bayesian hypothesis is not innocent. It needs to be carefully substantiated by evidence, instead of drawn from some blind (albeit “objective”) automatic procedure.”
The arguments in the supplementary material use the characters of Alice and Bob, which should be familiar to computer scientists and xkcd fans… More seriously, the author considers the asymptotics of false positives when the alternative prior is concentrated on a single value (the setting where Johnson defines his uniformly most powerful Bayesian test). Unsurprisingly, he recovers the original figure that about 20% of the rejected cases close to the standard 5% boundary are from the null (which is also the original figure from Berger and Sellke, 1985). The remainder of the section goes on criticising the Bayesian approach, but understood as Johnson’s non-standard representation! And going full circle as to why the frequentist approach to testing is the correct one.
Leave a Reply