## ASA’s statement on p-values [#2]

**I**t took a visit on FiveThirtyEight to realise the ASA statement I mentioned yesterday was followed by individual entries from most members of the panel, much more diverse and deeper than the statement itself! Without discussing each and all comments, some points I subscribe to

- it does not make sense to try to replace the p-value and the 5% boundary by something else but of the same nature. This was the main line of our criticism of Valen Johnson’s PNAS paper with Andrew.
- it does not either make sense to try to come up with a hard set answer about whether or not a certain parameter satisfies a certain constraint. A comparison of predictive performances at or around the observed data sounds much more sensible, if less definitive.
- the Bayes factor is often advanced as a viable alternative to the p-value in those comments, but it suffers from difficulties exposed in our recent testing by mixture paper, one being the lack of absolute scale.
- we seem unable to escape the landscape set by Neyman and Pearson when constructing their testing formalism, including the highly unrealistic 0-1 loss function. And the grossly asymmetric opposition between null and alternative hypotheses.
- the behaviour of any procedure of choice should be evaluated under different scenarios, most likely by simulation, including some accounting for misspecified models. Which may require an extra bit of non-parametrics. And we should abstain from considering further than evaluating whether or not the data looks compatible with each of the scenarios. Or how much through the mixture representation.

*Related*

This entry was posted on March 9, 2016 at 2:18 pm and is filed under Books, Kids, Statistics, University life with tags ASA, Basic and Applied Social Psychology, FiveThirtyEight, p-values, statistical significance, testing as mixture estimation, testing of hypotheses, The American Statistician, Valen Johnson. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

March 10, 2016 at 5:51 pm

xi’an:

Agree with your points here but especially point 2 and 5 may well look like this to others “the statistical profession does not [have] solutions but rather just (hopefully sensible) ways to struggle through [making sense of] observations we some how get”.

What they really need to hear and grasp but not what a statistical association and many statistician want to convey.

By the way, for my experiments, Bayes factor of 2, yours 10¹⁰!

Keith O’Rourke

March 10, 2016 at 6:52 pm

Thanks Keith!!! My main point, really. We should back off from providing definite conclusions and decisions, and only offer a calibration of our uncertainty…

March 9, 2016 at 2:47 pm

Could you elaborate on what exactly you mean by ” the lack of absolute scale” of Bayes factors?

Thanks!

March 9, 2016 at 3:20 pm

If Bayes factors are used to reject hypotheses and to select models, what is a sufficiently large value for a Bayes factor? 2? 10? 10¹⁰? This is what I mean.

March 10, 2016 at 9:06 pm

Why constrain the output to reject or accept? Surely a rational human is not constrained to respond to a Bayes factor in a dichotomous manner.

(The simplicity of arithmetic that comes from dichotomisation is possibly what led Neyman & Pearson to their framework with the 0-1 loss function that you mention.)

March 10, 2016 at 9:46 pm

Right, absolutely right! The outcome that is the Bayes factor should be treated as a summary, maybe a sufficient summary for testing purposes, and its value or better its predictive should be compared with what happens under each model or each hypothesis.