Archive for replication crisis

unrejected null [xkcd]

Posted in Statistics with tags , , , , , on July 18, 2018 by xi'an

long journey to reproducible results [or not]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on November 17, 2017 by xi'an

A rather fascinating article in Nature of last August [hidden under a pile of newspapers at home!]. By Gordon J. Lithgow, Monica Driscoll and Patrick Phillips. About their endeavours to explain for divergent outcomes in the replications [or lack thereof] of an earlier experiment on anti-aging drugs tested on roundworms. Rather than dismissing the failures or blaming the other teams, the above researchers engaged for four years (!) into the titanic and grubby task of understanding the reason(s) for such discrepancies.

Finding that once most causes for discrepancies (like gentle versus rough lab technicians!) were eliminated, there were still two “types” of worms, those short-lived and those long-lived, for reasons yet unclear. “We need to repeat more experiments than we realized” is a welcome conclusion to this dedicated endeavour, worth repeating in different circles. And apparently missing in the NYT coverage by Susan Dominus of the story of Amy Cuddy, a psychologist at the origin of the “power pose” theory that got later disputed for lack of reproducibility. Article which main ideological theme is that Cuddy got singled-out in the replication crisis because she is a woman and because her “power pose” theory is towards empowering women and minorities. Rather than because she keeps delivering the same message, mostly outside academia, despite the lack of evidence and statistical backup. (Dominus’ criticisms of psychologists with “an unusual interest in statistics” and of Andrew’s repeated comments on the methodological flaws of the 2010 paper that started all are thus particularly unfair. A Slate article published after the NYT coverage presents an alternative analysis of this affair. Andrew also posted on Dominus paper, with a subsequent humongous trail of comments!)

no publication without confirmation

Posted in Books, Statistics, University life with tags , , , on March 15, 2017 by xi'an

“Our proposal is a new type of paper for animal studies (…) that incorporates an independent, statistically rigorous confirmation of a researcher’s central hypothesis.” (p.409)

A comment tribune in Nature of Feb 23, 2017, suggests running clinical trials in three stages towards meeting higher standards in statistical validation. The idea is to impose a preclinical trial run by an independent team following an initial research showing some potential for some new treatment. The three stages are thus (i) to generate hypotheses; (ii) to test hypotheses; (iii) to test broader application of hypotheses (p.410). While I am skeptical of the chances of this proposal reaching adoption (for various reasons, like, what would the incentive of the second team be [of the B team be?!], especially if the hypothesis is dis-proved, how would both teams share the authorship and presumably patenting rights of the final study?, and how could independence be certain were the B team contracted by the A team?), the statistical arguments put forward in the tribune are rather weak (in my opinion). Repeating experiments with a larger sample size and an hypothesis set a priori rather than cherry-picked is obviously positive, but moving from a p-value boundary of 0.05 to one of 0.01 and to a power of 80% is more a cosmetic than a foundational change. As Andrew and I pointed out in our PNAS discussion of Johnson two years ago.

“the earlier experiments would not need to be held to the same rigid standards.” (p.410)

The article contains a vignette on “the maths of predictive value” that makes intuitive sense but only superficially. First, “the positive predictive value is the probability that a positive result is truly positive” (p.411) A statement that implies a distribution of probability on the space of hypotheses, although I see no Bayesian hint throughout the paper. Second, this (ersatz of a) probability is computed by a ratio of the number of positive results under the hypothesis over the total number of positive results. Which does not make much sense outside a Bayesian framework and even then cannot be assessed experimentally or by simulation without defining a distribution of the output under both hypotheses. Simplistic pictures are the above are not necessarily meaningful. And Nature should certainly invest into a statistical editor!

It’s the selection’s fault not the p-values’… [seminar]

Posted in pictures, Statistics, University life with tags , , , , , , on February 5, 2016 by xi'an

Paris and la Seine, from Pont du Garigliano, Oct. 20, 2011Yoav Benjamini will give a seminar talk in Paris next Monday on the above (full title: “The replicability crisis in science: It’s the selection’s fault not the p-values’“). (That I will miss for being in Warwick at the time.) With a fairly terse abstract:

I shall discuss the problem of lack of replicability of results in science, and point at selective inference as a statistical root cause. I shall then present a few strategies for addressing selective inference, and their application in genomics, brain research and earlier phases of clinical trials where both primary and secondary endpoints are being used.

Details: February 8, 2016, 16h, Université Pierre & Marie Curie, campus Jussieu, salle 15-16-101.

Le Monde and the replication crisis

Posted in Books, Kids, Statistics with tags , , , , , , , , , , , , , , , on September 17, 2015 by xi'an

An rather poor coverage of the latest article in Science on the replication crisis in psychology in Le Monde Sciences & Medicine weekly pages (and mentioned a few days ago on Andrew’s blog, with the terrific if unrelated poster for Blade Runner…):

L’étude repose également sur le rôle d’un critère très critiqué, la “valeur p”, qui est un indicateur statistique estimant la probabilité que l’effet soit bien significatif.

As you may guess from the above (pardon my French!), the author of this summary of the Science article (a) has never heard of a p-value (which translates as niveau de signification in French statistics books) and (b) confuses the probability of exceeding the observed quantity under the null with the probability of the alternative. The remainder of the paper is more classical, pointing out the need for preregistered protocols in experimental sciences. Even though it mostly states evidence, like the decrease in significant effects for prepublished protocols. Apart from this mostly useless entry, rather interesting snapshots in the issue: Stephen Hawking’s views on how information could escape a black hole, an IBM software for predicting schizophrenia, Parkinson disease as a result of hyperactive neurons, diseased Formica fusca ants taking some harmful drugs to heal, …