is the p-value a good measure of evidence?

Statistics abounds criteria for assessing quality of estimators, tests, forecasting rules, classification algorithms, but besides the likelihood principle discussions, it seems to be almost silent on what criteria should a good measure of evidence satisfy.” M. Grendár

A short note (4 pages) appeared on arXiv a few days ago, entitled “is the p-value a good measure of evidence? an asymptotic consistency criterion” by M. Grendár. It is rather puzzling in that it defines the consistency of an evidence measure ε(H1,H2,Xn) (for the hypothesis H1 relative to the alternative H2) by

\lim_{n\rightarrow\infty} P(H_1|\epsilon(\neg H_1,H_2,X^n)\in S) =0

where S is “the category of the most extreme values of the evidence measure (…) that corresponds to the strongest evidence” (p.2) and which is interpreted as “the probability [of the first hypothesis H1], given that the measure of evidence strongly testifies against H1, relative to H2 should go to zero” (p.2). So this definition requires a probability measure on the parameter  spaces or at least on the set of model indices, but it is not explicitly stated in the paper. The proofs that the p-value is inconsistent and that the likelihood ratio is consistent do involve model/hypothesis prior probabilities and weights, p(.) and w. However, the last section on the consistency of the Bayes factor states “it is open to debate whether a measure of evidence can depend on a prior information” (p.3) and it uses another notation, q(.), for the prior distribution…  Furthermore, it reproduces the argument found in Templeton that larger evidence should be attributed to larger hypotheses. And it misses our 1992 analysis of p-values from a decision-theoretic perspective, where we show they are inadmissible for two-sided tests, answering the question asked in the quote above.

4 Responses to “is the p-value a good measure of evidence?”

  1. […] by the review). Indeed, it considers a series of competing computational methods for approximating evidence, aka marginal […]

  2. hi christian, sorry for the stupid question. (if it is too stupid, don’t feel as if you have to answer.) but what does admissibility mean?

    • Admissibility and its opposite inadmissibility are two notions used in game theory and in statistical decision theory to evaluate estimators. An inadmissible estimator is dominated everywhere by another estimator, hence should not be used (if the domination criterion is relevant for the problem at hand). Wald in the 1950’s showed that the admissible estimators are more or less the Bayes estimators. This is covered in Chapter 3 of The Bayesian Choice.

  3. Paulo Marques Says:

    Also, this other kind of incoherence makes p-values unbearable:

    http://www.jstor.org/pss/2684655

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.