Archive for Neyman-Pearson tests

statistics for making decisions [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , on March 7, 2022 by xi'an

I bought this book [or more precisely received it from CRC Press as a ({prospective} book) review reward] as I was interested in the author’s perspectives on actual decision making (and unaware of the earlier Statistical Decision Theory book he had written in 2013). It is intended for a postgraduate semester course and  “not for a beginner in statistics”. Exercises with solutions are included in each chapter (with some R codes in the solutions). From Chapter 4 onwards, the “Further reading suggestions” are primarily referring to papers and books written by the author, as these chapters are based on his earlier papers.

“I regard hypothesis testing as a distraction from and a barrier to good statistical practice. Its ritualised application should be resisted from the position of strength, by being well acquainted with all its theoretical and practical aspects. I very much hope (…) that the right place for hypothesis testing is in a museum, next to the steam engine.”

The first chapter exposes the shortcomings of hypothesis testing for conducting decision making, in particular by ignoring the consequences of the decisions. A perspective with which I agree, but I fear the subsequent developments found in the book remain too formalised to be appealing, reverting to the over-simplification found in Neyman-Pearson theory. The second chapter is somewhat superfluous for a book assuming a prior exposure to statistics, with a quick exposition of the frequentist, Bayesian, and … fiducial paradigms. With estimators being first defined without referring to a specific loss function. And I find the presentation of the fiducial approach rather shaky (if usual). Esp. when considering fiducial perspective to be used as default Bayes in the subsequent chapters. I also do not understand the notation (p.31)

P(\hat\theta<c;\,\theta\in\Theta_\text{H})

outside of a Bayesian (or fiducial?) framework. (I did not spot typos aside from the traditional “the the” duplicates, with at least six occurences!)

The aforementioned subsequent chapters are not particularly enticing as they cater to artificial loss functions and engage into detailed derivations that do not seem essential. At times they appear to be nothing more than simple calculus exercises. The very construction of the loss function, which I deem critical to implement statistical decision theory, is mostly bypassed. The overall setting is also frighteningly unidimensional. In the parameter, in the statistic, and in the decision. Covariates only appear in the final chapter which appears to have very little connection with decision making in that the loss function there is the standard quadratic loss, used to achieve the optimal composition of estimators, rather than selecting the best model. The book is also missing in practical or realistic illustrations.

“With a bit of immodesty and a tinge of obsession, I would like to refer to the principal theme of this book as a paradigm, ascribing to it as much importance and distinction as to the frequentist and Bayesian paradigms”

The book concludes with a short postscript (pp.247-249) reproducing the introducing paragraphs about the ill-suited nature of hypothesis testing for decision-making. Which would have been better supported by a stronger engagement into elicitating loss functions and quantifying the consequences of actions from the clients…

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

at last the type IX error

Posted in Statistics with tags , , , , , , , on May 11, 2020 by xi'an

on Dutch book arguments

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , on May 1, 2017 by xi'an

“Reality is not always probable, or likely.”― Jorge Luis Borges

As I am supposed to discuss Teddy Seidenfeld‘s talk at the Bayes, Fiducial and Frequentist conference in Harvard today [the snow happened last time!], I started last week [while driving to Wales] reading some related papers of his. Which is great as I had never managed to get through the Dutch book arguments, including those in Jim’s book.

The paper by Mark Schervish, Teddy Seidenfeld, and Jay Kadane is defining coherence as the inability to bet against the predictive statements based on the procedure. A definition that sounds like a self-fulfilling prophecy to me as it involves a probability measure over the parameter space. Furthermore, the notion of turning inference, which aims at scientific validation, into a leisure, no-added-value, and somewhat ethically dodgy like gambling, does not agree with my notion of a validation for a theory. That is, not as a compelling reason for adopting a Bayesian approach. Not that I have suddenly switched to the other [darker] side, but I do not feel those arguments helping in any way, because of this dodgy image associated with gambling. (Pardon my French, but each time I read about escrows, I think of escrocs, or crooks, which reinforces this image! Actually, this name derives from the Old French escroue, but the modern meaning of écroué is sent to jail, which brings us back to the same feeling…)

Furthermore, it sounds like both a weak notion, since it implies an almost sure loss for the bookmaker, plus coherency holds for any prior distribution, including Dirac masses!, and a frequentist one, in that it looks at all possible values of the parameter (in a statistical framework). It also turns errors into monetary losses, taking them at face value. Which sounds also very formal to me.

But the most fundamental problem I have with this approach is that, from a Bayesian perspective, it does not bring any evaluation or ranking of priors, and in particular does not help in selecting or eliminating some. By behaving like a minimax principle, it does not condition on the data and hence does not evaluate the predictive properties of the model in terms of the data, e.g. by comparing pseudo-data with real data.

 While I see no reason to argue in favour of p-values or minimax decision rules, I am at a loss in understanding the examples in How to not gamble if you must. In the first case, i.e., when dismissing the α-level most powerful test in the simple vs. simple hypothesis testing case, the argument (in Example 4) starts from the classical (Neyman-Pearsonist) statistician favouring the 0.05-level test over others. Which sounds absurd, as this level corresponds to a given loss function, which cannot be compared with another loss function. Even though the authors chose to rephrase the dilemma in terms of a single 0-1 loss function and then turn the classical solution into the choice of an implicit variance-dependent prior. Plus force the poor Pearsonist to make a wager represented by the risk difference. The whole sequence of choices sounds both very convoluted and far away from the usual practice of a classical statistician… Similarly, when attacking [in Section 5.2] the minimax estimator in the Bernoulli case (for the corresponding proper prior depending on the sample size n), this minimax estimator is admissible under quadratic loss and still a Dutch book argument applies, which in my opinion definitely argues against the Dutch book reasoning. The way to produce such a domination result is to mix two Bernoulli estimation problems for two different sample sizes but the same parameter value, in which case there exist [other] choices of Beta priors and a convex combination of the risks functions that lead to this domination. But this example [Example 6] mostly exposes the artificial nature of the argument: when estimating the very same probability θ, what is the relevance of adding the risks or errors resulting from using two estimators for two different sample sizes. Of the very same probability θ. I insist on the very same because when instead estimating two [independent] values of θ, there cannot be a Stein effect for the Bernoulli probability estimation problem, that is, any aggregation of admissible estimators remains admissible. (And yes it definitely sounds like an exercise in frequentist decision theory!)

inferential models: reasoning with uncertainty [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 6, 2016 by xi'an

“the field of statistics (…) is still surprisingly underdeveloped (…) the subject lacks a solid theory for reasoning with uncertainty [and] there has been very little progress on the foundations of statistical inference” (p.xvi)

A book that starts with such massive assertions is certainly hoping to attract some degree of attention from the field and likely to induce strong reactions to this dismissal of the not inconsiderable amount of research dedicated so far to statistical inference and in particular to its foundations. Or even attarcting flak for not accounting (in this introduction) for the past work of major statisticians, like Fisher, Kiefer, Lindley, Cox, Berger, Efron, Fraser and many many others…. Judging from the references and the tone of this 254 pages book, it seems like the two authors, Ryan Martin and Chuanhai Liu, truly aim at single-handedly resetting the foundations of statistics to their own tune, which sounds like a new kind of fiducial inference augmented with calibrated belief functions. Be warned that five chapters of this book are built on as many papers written by the authors in the past three years. Which makes me question, if I may, the relevance of publishing a book on a brand-new approach to statistics without further backup from a wider community.

“…it is possible to calibrate our belief probabilities for a common interpretation by intelligent minds.” (p.14)

Chapter 1 contains a description of the new perspective in Section 1.4.2, which I find useful to detail here. When given an observation x from a Normal N(θ,1) model, the authors rewrite X as θ+Z, with Z~N(0,1), as in fiducial inference, and then want to find a “meaningful prediction of Z independently of X”. This seems difficult to accept given that, once X=x is observed, Z=X-θ⁰, θ⁰ being the true value of θ, which belies the independence assumption. The next step is to replace Z~N(0,1) by a random set S(Z) containing Z and to define a belief function bel() on the parameter space Θ by

bel(A|X) = P(X-S(Z)⊆A)

which induces a pseudo-measure on Θ derived from the distribution of an independent Z, since X is already observed. When Z~N(0,1), this distribution does not depend on θ⁰ the true value of θ… The next step is to choose the belief function towards a proper frequentist coverage, in the approximate sense that the probability that bel(A|X) be more than 1-α is less than α when the [arbitrary] parameter θ is not in A. And conversely. This property (satisfied when bel(A|X) is uniform) is called validity or exact inference by the authors: in my opinion, restricted frequentist calibration would certainly sound more adequate.

“When there is no prior information available, [the philosophical justifications for Bayesian analysis] are less than fully convincing.” (p.30)

“Is it logical that an improper “ignorance” prior turns into a proper “non-ignorance” prior when combined with some incomplete information on the whereabouts of θ?” (p.44)

Continue reading

%d bloggers like this: