Archive for Statistical Inference as Severe Testing

modelling protocol in Nature

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on August 19, 2020 by xi'an

A three-page commentary in a recent issue of Nature is a manifesto for responsible modelling, with among the numerous signatories, Deborah Mayo.  (And Phillip Stark as the only statistician I spotted.) The main theme is that the model is not the real thing, e.g., the map is not the territory. Which as such is hardly debatable. The point of the tribune is that, in the light of the pandemic crisis, a large portion of the general population has discovered that mathematical models were not the truth and that their predictions were to be taken with a few marshes of salt. Either because they were based on faulty or antiquated data, if any. Or because their approximation level was too high to return any reliable figure. A failure to understand the nature of mathematical models reminding me of the 2008 financial crisis and of the bemused question of Liz Windsor and of the muddled response of economists:

“Why did nobody notice it?”

“Your Majesty,” eminent economists replied, “the failure to foresee the timing, extent and severity of the crisis and to head it off, while it had many causes, was principally a failure of the collective imagination of many bright people, both in this country and internationally, to understand the risks to the system as a whole.”

“People got a bit lax … perhaps it is difficult to foresee”

The manifesto calls for open assumptions, sensitivity analysis, uncertainty quantification, wariness of overfitting and structural biases (what is the utility function?), and the inclusion of ignorance acknowledgement as an outcome of the model. Which again sounds completely sound if not necessarily helpful when facing interlocutors asking for point estimates. I also regret that the tribune gives hardly any room to statistics and the model checking tools it had developed, except in mentioning the p-hacking and the false feeling of certainty produced by a p-value. Plus a bizarre mention of a French movement of statactivistes of which I had not heard and which seems connected to a book published in French by three of the signatories.

perspectives on Deborah Mayo’s Statistics Wars

Posted in Statistics with tags , , , , on October 23, 2019 by xi'an

A few months ago, Andrew Gelman collated and commented the reviews of Deborah Mayo’s book by himself, Brian Haig, Christian Hennig, Art B. Owen, Robert Cousins, Stan Young, Corey Yanofsky, E.J. Wagenmakers, Ron Kenett, Daniel Lakeland, and myself. The collection did not make it through the review process of the Harvard Data Science Review! it is however available on-line for perusal…

severe testing : beyond Statistics wars?!

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on January 7, 2019 by xi'an

A timely start to my reading Deborah Mayo’s [properly printed] Statistical Inference as Severe Testing (How to get beyond the Statistics Wars) on the Armistice Day, as it seems to call for just this, an armistice! And the opportunity of a long flight to Oaxaca in addition… However, this was only the start and it took me several further weeks to peruse seriously enough the book (SIST) before writing the (light) comments below. (Receiving a free copy from CUP and then a second one directly from Deborah after I mentioned the severe sabotage!)

Indeed, I sort of expected a different content when taking the subtitle How to get beyond the Statistics Wars at face value. But on the opposite the book is actually very severely attacking anything not in the line of the Cox-Mayo severe testing line. Mostly Bayesian approach(es) to the issue! For instance, Jim Berger’s construct of his reconciliation between Fisher, Neyman, and Jeffreys is surgically deconstructed over five pages and exposed as a Bayesian ploy. Similarly, the warnings from Dennis Lindley and other Bayesians that the p-value attached with the Higgs boson experiment are not probabilities that the particle does not exist are met with ridicule. (Another go at Jim’s Objective Bayes credentials is found in the squared myth of objectivity chapter. Maybe more strongly than against staunch subjectivists like Jay Kadane. And yet another go when criticising the Berger and Sellke 1987 lower bound results. Which even extends to Vale Johnson’s UMP-type Bayesian tests.)

“Inference should provide posterior probabilities, final degrees of support, belief, probability (…) not provided by Bayes factors.” (p.443)

Another subtitle of the book could have been testing in Flatland given the limited scope of the models considered with one or at best two parameters and almost always a Normal setting. I have no idea whatsoever how the severity principle would apply in more complex models, with e.g. numerous nuisance parameters. By sticking to the simplest possible models, the book can carry on with the optimality concepts of the early days, like sufficiency (p.147) and and monotonicity and uniformly most powerful procedures, which only make sense in a tiny universe.

“The estimate is really a hypothesis about the value of the parameter.  The same data warrant the hypothesis constructed!” (p.92)

There is an entire section on the lack of difference between confidence intervals and the dual acceptance regions, although the lack of unicity in defining either of them should come as a bother. Especially outside Flatland. Actually the following section, from p.193 onward, reminds me of fiducial arguments, the more because Schweder and Hjort are cited there. (With a curve like Fig. 3.3. operating like a cdf on the parameter μ but no dominating measure!)

“The Fisher-Neyman dispute is pathological: there’s no disinterring the truth of the matter (…) Fisher grew to renounce performance goals he himself had held when it was found that fiducial solutions disagreed with them.”(p.390)

Similarly the chapter on the “myth of the “the myth of objectivity””(p.221) is mostly and predictably targeting Bayesian arguments. The dismissal of Frank Lad’s arguments for subjectivity ends up [or down] with a rather cheap that it “may actually reflect their inability to do the math” (p.228). [CoI: I once enjoyed a fantastic dinner cooked by Frank in Christchurch!] And the dismissal of loss function requirements in Ziliak and McCloskey is similarly terse, if reminding me of Aris Spanos’ own arguments against decision theory. (And the arguments about the Jeffreys-Lindley paradox as well.)

“It’s not clear how much of the current Bayesian revolution is obviously Bayesian.” (p.405)

The section (Tour IV) on model uncertainty (or against “all models are wrong”) is somewhat limited in that it is unclear what constitutes an adequate (if wrong) model. And calling for the CLT cavalry as backup (p.299) is not particularly convincing.

It is not that everything is controversial in SIST (!) and I found agreement in many (isolated) statements. Especially in the early chapters. Another interesting point made in the book is to question whether or not the likelihood principle at all makes sense within a testing setting. When two models (rather than a point null hypothesis) are X-examined, it is a rare occurrence that the likelihood factorises any further than the invariance by permutation of iid observations. Which reminded me of our earlier warning on the dangers of running ABC for model choice based on (model specific) sufficient statistics. Plus a nice sprinkling of historical anecdotes, esp. about Neyman’s life, from Poland, to Britain, to California, with some time in Paris to attend Borel’s and Lebesgue’s lectures. Which is used as a background for a play involving Bertrand, Borel, Neyman and (Egon) Pearson. Under the title “Les Miserables Citations” [pardon my French but it should be Les Misérables if Hugo is involved! Or maybe les gilets jaunes…] I also enjoyed the sections on reuniting Neyman-Pearson with Fisher, while appreciating that Deborah Mayo wants to stay away from the “minefields” of fiducial inference. With, mot interestingly, Neyman himself trying in 1956 to convince Fisher of the fallacy of the duality between frequentist and fiducial statements (p.390). Wisely quoting Nancy Reid at BFF4 stating the unclear state of affair on confidence distributions. And the final pages reawakened an impression I had at an earlier stage of the book, namely that the ABC interpretation on Bayesian inference in Rubin (1984) could come closer to Deborah Mayo’s quest for comparative inference (p.441) than she thinks, in that producing parameters producing pseudo-observations agreeing with the actual observations is an “ability to test accordance with a single model or hypothesis”.

“Although most Bayesians these days disavow classic subjective Bayesian foundations, even the most hard-nosed. “we’re not squishy” Bayesian retain the view that a prior distribution is an important if not the best way to bring in background information.” (p.413)

A special mention to Einstein’s cafe (p.156), which reminded me of this picture of Einstein’s relative Cafe I took while staying in Melbourne in 2016… (Not to be confused with the Markov bar in the same city.) And a fairly minor concern that I find myself quoted in the sections priors: a gallimaufry (!) and… Bad faith Bayesianism (!!), with the above qualification. Although I later reappear as a pragmatic Bayesian (p.428), although a priori as a counter-example!