## no publication without confirmation

“Our proposal is a new type of paper for animal studies (…) that incorporates an independent, statistically rigorous confirmation of a researcher’s central hypothesis.” (p.409)

**A** comment tribune in Nature of Feb 23, 2017, suggests running clinical trials in three stages towards meeting higher standards in statistical validation. The idea is to impose a preclinical trial run by an independent team following an initial research showing some potential for some new treatment. The three stages are thus (i) to generate hypotheses; (ii) to test hypotheses; (iii) to test broader application of hypotheses (p.410). While I am skeptical of the chances of this proposal reaching adoption (for various reasons, like, what would the incentive of the second team be [of the B team be?!], especially if the hypothesis is dis-proved, how would both teams share the authorship and presumably patenting rights of the final study?, and how could independence be certain were the B team contracted by the A team?), the statistical arguments put forward in the tribune are rather weak (in my opinion). Repeating experiments with a larger sample size and an hypothesis set a priori rather than cherry-picked is obviously positive, but moving from a p-value boundary of 0.05 to one of 0.01 and to a power of 80% is more a cosmetic than a foundational change. As Andrew and I pointed out in our PNAS discussion of Johnson two years ago.

“the earlier experiments would not need to be held to the same rigid standards.” (p.410)

The article contains a vignette on “the maths of predictive value” that makes intuitive sense but only superficially. First, “the positive predictive value is the probability that a positive result is truly positive” (p.411) A statement that implies a distribution of probability on the space of hypotheses, although I see no Bayesian hint throughout the paper. Second, this (ersatz of a) probability is computed by a ratio of the number of positive results under the hypothesis over the total number of positive results. Which does not make much sense outside a Bayesian framework and even then cannot be assessed experimentally or by simulation without defining a distribution of the output under both hypotheses. Simplistic pictures are the above are not necessarily meaningful. And Nature should certainly invest into a statistical editor!

March 15, 2017 at 10:03 pm

“how could independence be certain were the B team contracted by the A team?” Suggested resolution: The B Team is *not* contracted by the A team, but by the referee/journal to which A initially submits its study.

“what would the incentive of the second team be [of the B team be?!]” Suggested resolution: whether B reject or not the null hypothesis “what A says is significant is not repeatably significant”, there is *a* finding to publish, and everyone likes to have something published. (I think I am suggesting that journals should still decide whether to publish A or not in the same way that they already do, but to commit also to publishing B’s findings *at the same time*.)

More: while one can argue that garden-of-forking-paths inflates “significance”, it should also be plain that “this p-value calculated in this way will be significant” is itself an experimental claim (provided it isn’t tautological): it’s just that team B now need not and cannot design their own experimental/analytical set-up; they only have to collect independent data of the same kind as team A claim, and repeat their analysis.

March 16, 2017 at 8:46 am

Thanks. I remain doubtful about motivating the B team in investing time & money into a commanded experiment that can either succeed in which case the A team gets the fame, or fail in which case the B team gets published in the Journal of Negative Experiments, which is not particularly glorious.

March 16, 2017 at 9:16 am

OK, so I did say “everyone likes to have something published”, which obviously is a bit glib; but actually, not everyone aspires to the same things, even in academia. So let us not be so hasty to judge that no-one will want to make science actually science: it

isn’tscience if it isn’t repeatable (to wax glib again). Some of us are less interested in priority than in just finding what’s so and what isn’t.By the way, I don’t think a separate “Journal of Negative Experiments” is what’s wanted at all: A and B are published together or not at all in the journal I was trying to describe.

( Of course, a journal could resort to blackmail: “if you really want your study B1 reviewed and published, we’d really like your independent confirmation/refutation B2 of A1’s reported findings” )