## Valen in Le Monde

**V**alen Johnson made the headline in *Le Monde*, last week. (More precisely, to the scientific blog *Passeur de Sciences*. Thanks, Julien, for the pointer!) With the alarming title of “Une étude ébranle un pan de la méthode scientifique” *(A study questions one major tool of the scientific approach)*. The reason for this French fame is Valen’s recent paper in PNAS, *Revised standards for statistical evidence*, where he puts forward his uniformly most powerful Bayesian tests (recently discussed on the ‘Og) to argue against the standard 0.05 significance level and in favour of “the 0.005 or 0.001 level of significance.”

“…many statisticians have noted that P values of 0.05 may correspond to Bayes factors that only favor the alternative hypothesis by odds of 3 or 4–1…”V. Johnson, PNAS

**W**hile I do plan to discuss the PNAS paper later (and possibly write a comment letter to PNAS with Andrew), I find interesting the way it made the headlines within days of its (early edition) publication: the argument suggesting to replace .05 with .001 to increase the proportion of reproducible studies is both simple and convincing for a scientific journalist. If only the issue with p-values and statistical testing could be that simple… For instance, the above quote from Valen is reproduced as “an [alternative] hypothesis that stands right below the significance level has in truth only 3 to 5 chances to 1 to be true”, the “truth” popping out of nowhere. (If you read French, the 300+ comments on the blog are also worth their weight in jellybeans…)

November 21, 2013 at 11:49 pm

Yet, one (maybe) can interpret p-values as mutilated, numerical approximations of Bayes factors in which one has arbitrarily discretized the sample space into regions of “High” and “Low” values.

In Popper’s ideal world one would be able to develop a scientific theory (H0) in which P(Data Is High|H0) = 0 and P(Data is Low|H0)=1 . Irrespective of any alternatives P(H0|Data Is High) = 0 (=P(Data Is High|H0) ). Of course we live in a non-ideal world and so do our theories i.e. one cannot possibly develop a theory that can so clearly demarkate its domain.

In such a case, the best one can (and should!) hope to do is to utilize Bayes theorem:

P(H0|Data Is High) proportional to P(Data Is High|H0)

so that (approximately) one can reject H0 if the P(Data Is High|H0) and thus P(H0|Data Is High) is small against an (implied) set of more plausible alternatives.

This is how non-statisticians seem to be using the magnitude of the p-value in evaluating the plausibility of hypotheses.

November 22, 2013 at 7:31 am

There are many answers to be found in the Bayesian literature about this, but your statement that

“P(H0|Data Is High) proportional to P(Data Is High|H0)”

while mathematically correct is not useful for decisions given that the proportionality factor matters. This is the crux of the p-value defect in my opinion, since it does not account for what happens under the alternative.

November 21, 2013 at 4:19 pm

When testing sharp null hypotheses, BF and p values are often in conflict especially if sample size is large or if the prior under H1 has a large variance. There had been many attempts to either adjust BF to p-values (eg Aikin posterior BF) or to make p values in better agreement to BF (ie smaller as here).

It reminds me what a professor used to tell us: if the piano and the stool are too away from each other, do we have to bring the piano closer to the stool, or vice versa? Or may be just stand where we are but with a deeper understanding of what we do?

November 21, 2013 at 2:12 pm

Hi Xian: nice post; however it should be Valen, not Vale.

November 21, 2013 at 10:42 pm

thanks, Guido!

November 21, 2013 at 9:17 am

Christian, Is the problem for you that the p-value, however low, is only going to tell you the probability of your data (roughly speaking) assuming the null is true, it’s not going to tell you anything about the probability of the alternative hypothesis, which is the real hypothesis of interest.

However, limiting the discussion to (bayesian) hierarchical models (linear mixed models), which is the type of model people often fit in repeated measures studies in psychology (or at least in psycholinguistics), as long as the problem is about figuring out P(theta>0) or P(theta 0.8), the decision (to act as if theta0) is going to be the same regardless of whether one uses p-values or a fully bayesian approach. This is because the likelihood is going to dominate in the bayesian model.

Andrew has objected to this line of reasoning by saying that making a decision like theta0 is not a reasonable one in the first place. That is true in some cases, where the result of one experiment never replicates because of study effects or whatever. But there are a lot of effects which are robust and replicable, and where it makes sense to ask these types of questions.

One central issue for me is: in situations like these, using a low p-value to make such a decision is going to yield pretty similar outcomes compared to doing inference using the posterior distribution. The machinery needed to do a fully bayesian analysis is very intimidating; you need to know a lot, and you need to do a lot more coding and checking than when you fit an lmer type of model.

It took me 1.5 to 2 years of hard work (=evenings spent not reading novels) to get to the point that I knew roughly what I was doing when fitting bayesian models. I don’t blame anyone for not wanting to put their life on hold to get to such a point. I find the bayesian method attractive because it actually answers the question I really asked, namely is theta>0 or theta<0? This is really great, I don't have beat around the bush any more! (there; I just used an exclamation mark). But for the researcher unwilling (or more likely: unable) to invest the time into the math and probability theory and the world of BUGS, the distance between a heuristic like a low p-value and the more sensible bayesian approach is not that large.

I'm standing by for a fully fledged bayesian attack from you now! :)

November 21, 2013 at 9:35 am

Shravan, do you mind to turn this into a guest post? Thanks!

November 21, 2013 at 12:29 am

I really want to see your answer ! and I also agree… it can’t be that simple… (with 0.001 it will be difficult to get something “significant” in social sciences)