is the p-value a good measure of evidence?
“Statistics abounds criteria for assessing quality of estimators, tests, forecasting rules, classification algorithms, but besides the likelihood principle discussions, it seems to be almost silent on what criteria should a good measure of evidence satisfy.” M. Grendár
A short note (4 pages) appeared on arXiv a few days ago, entitled “is the p-value a good measure of evidence? an asymptotic consistency criterion” by M. Grendár. It is rather puzzling in that it defines the consistency of an evidence measure ε(H1,H2,Xn) (for the hypothesis H1 relative to the alternative H2) by
where S is “the category of the most extreme values of the evidence measure (…) that corresponds to the strongest evidence” (p.2) and which is interpreted as “the probability [of the first hypothesis H1], given that the measure of evidence strongly testifies against H1, relative to H2 should go to zero” (p.2). So this definition requires a probability measure on the parameter spaces or at least on the set of model indices, but it is not explicitly stated in the paper. The proofs that the p-value is inconsistent and that the likelihood ratio is consistent do involve model/hypothesis prior probabilities and weights, p(.) and w. However, the last section on the consistency of the Bayes factor states “it is open to debate whether a measure of evidence can depend on a prior information” (p.3) and it uses another notation, q(.), for the prior distribution… Furthermore, it reproduces the argument found in Templeton that larger evidence should be attributed to larger hypotheses. And it misses our 1992 analysis of p-values from a decision-theoretic perspective, where we show they are inadmissible for two-sided tests, answering the question asked in the quote above.