Archive for non-reproducible research

Bayes, reproducibility, and the quest for truth

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on September 2, 2016 by xi'an

“Avoid opinion priors, you could be held legally or otherwise responsible.”

Don Fraser, Mylène Bedard, Augustine Wong, Wei Lin, and Ailana Fraser wrote a paper to appear in Statistical Science, with the above title. This paper is a continuation of Don’s assessment of Bayes procedures in earlier Statistical Science [which I discussed] and Science 2013 papers, which I would qualify with all due respect of a demolition enterprise [of the Bayesian approach to statistics]…  The argument therein is similar in that “reproducibility” is to be understood therein as providing frequentist confidence assessment. The authors also use “accuracy” in this sense. (As far as I know, there is no definition of reproducibility to be found in the paper.) Some priors are matching priors, in the (restricted) sense that they give second-order accurate frequentist coverage. Most are not matching and none is third-order accurate, a level that may be attained by alternative approaches. As far as the abstract goes, this seems to be the crux of the paper. Which is fine, but does not qualify in my opinion as a criticism of the Bayesian paradigm, given that (a) it makes no claim at frequentist coverage and (b) I see no reason in proper coverage being connected with “truth” or “accuracy”. It truly makes no sense to me to attempt either to put a frequentist hat on posterior distributions or to check whether or not the posterior is “valid”, “true” or “actual”. I similarly consider that Efron‘s “genuine priors” do not belong to the Bayesian paradigm but are on the opposite anti-Bayesian in that they suggest all priors should stem from frequency modelling, to borrow the terms from the current paper. (This is also the position of the authors, who consider they have “no Bayes content”.)

Among their arguments, the authors refer to two tragic real cases: the earthquake at L’Aquila, where seismologists were charged (and then discharged) with manslaughter for asserting there was little risk of a major earthquake, and the indictment of the pharmaceutical company Merck for the deadly side-effects of their drug Vioxx. The paper however never return to those cases and fails to explain in which sense this is connected with the lack of reproducibility or of truth(fullness) of Bayesian procedures. If anything, the morale of the Aquila story is that statisticians should not draw definitive conclusions like there is no risk of a major earthquake or that it was improbable. There is a strange if human tendency for experts to reach definitive conclusions and to omit the many layers of uncertainty in their models and analyses. In the earthquake case, seismologists do not know how to predict major quakes from the previous activity and that should have been the [non-]conclusion of the experts. Which could possibly have been reached by a Bayesian modelling that always includes uncertainty. But the current paper is not at all operating at this (epistemic?) level, as it never ever questions the impact of the choice of a likelihood function or of a statistical model in the reproducibility framework. First, third or 47th order accuracy nonetheless operates strictly within the referential of the chosen model and providing the data to another group of scientists, experts or statisticians will invariably produce a different statistical modelling. So much for reproducibility or truth.

Revised evidence for statistical standards

Posted in Kids, Statistics, University life with tags , , , , , , , , on December 19, 2013 by xi'an

valizWe just submitted a letter to PNAS with Andrew Gelman last week, in reaction to Val Johnson’s recent paper “Revised standards for statistical evidence”, essentially summing up our earlier comments within 500 words. Actually, we wrote one draft each! In particular, Andrew came up with the (neat) rhetorical idea of alternative Ronald Fishers living in parallel universes who had each set a different significance reference level and for whom alternative Val Johnsons would rise and propose a modification of the corresponding Fisher’s level. For which I made the above graph, left out of the letter and its 500 words. It relates “the old z” and “the new z”, meaning the boundaries of the rejection zones when, for each golden dot, the “old z” is the previous “new z” and “the new z” is Johnson’s transform. We even figured out that Val’s transform was bringing the significance down by a factor of 10 in a large range of values. As an aside, we also wondered why most of the supplementary material was spent on deriving UMPBTs for specific (formal) problems when the goal of the paper sounded much more global…

As I am aware we are not the only ones to have submitted a letter about Johnson’s proposal, I am quite curious at the reception we will get from the editor! (Although I have to point out that all of my earlier submissions of letters to to PNAS got accepted.)