is the p-value a good measure of evidence?
“Statistics abounds criteria for assessing quality of estimators, tests, forecasting rules, classification algorithms, but besides the likelihood principle discussions, it seems to be almost silent on what criteria should a good measure of evidence satisfy.” M. Grendár
A short note (4 pages) appeared on arXiv a few days ago, entitled “is the p-value a good measure of evidence? an asymptotic consistency criterion” by M. Grendár. It is rather puzzling in that it defines the consistency of an evidence measure ε(H1,H2,Xn) (for the hypothesis H1 relative to the alternative H2) by
where S is “the category of the most extreme values of the evidence measure (…) that corresponds to the strongest evidence” (p.2) and which is interpreted as “the probability [of the first hypothesis H1], given that the measure of evidence strongly testifies against H1, relative to H2 should go to zero” (p.2). So this definition requires a probability measure on the parameter spaces or at least on the set of model indices, but it is not explicitly stated in the paper. The proofs that the p-value is inconsistent and that the likelihood ratio is consistent do involve model/hypothesis prior probabilities and weights, p(.) and w. However, the last section on the consistency of the Bayes factor states “it is open to debate whether a measure of evidence can depend on a prior information” (p.3) and it uses another notation, q(.), for the prior distribution… Furthermore, it reproduces the argument found in Templeton that larger evidence should be attributed to larger hypotheses. And it misses our 1992 analysis of p-values from a decision-theoretic perspective, where we show they are inadmissible for two-sided tests, answering the question asked in the quote above.
January 10, 2012 at 12:13 am
[...] by the review). Indeed, it considers a series of competing computational methods for approximating evidence, aka marginal [...]
November 30, 2011 at 3:33 am
hi christian, sorry for the stupid question. (if it is too stupid, don’t feel as if you have to answer.) but what does admissibility mean?
November 30, 2011 at 8:17 am
Admissibility and its opposite inadmissibility are two notions used in game theory and in statistical decision theory to evaluate estimators. An inadmissible estimator is dominated everywhere by another estimator, hence should not be used (if the domination criterion is relevant for the problem at hand). Wald in the 1950′s showed that the admissible estimators are more or less the Bayes estimators. This is covered in Chapter 3 of The Bayesian Choice.
November 30, 2011 at 2:55 am
Also, this other kind of incoherence makes p-values unbearable:
http://www.jstor.org/pss/2684655