## severe testing : beyond Statistics wars?!

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on January 7, 2019 by xi'an

A timely start to my reading Deborah Mayo’s [properly printed] Statistical Inference as Severe Testing (How to get beyond the Statistics Wars) on the Armistice Day, as it seems to call for just this, an armistice! And the opportunity of a long flight to Oaxaca in addition… However, this was only the start and it took me several further weeks to peruse seriously enough the book (SIST) before writing the (light) comments below. (Receiving a free copy from CUP and then a second one directly from Deborah after I mentioned the severe sabotage!)

Indeed, I sort of expected a different content when taking the subtitle How to get beyond the Statistics Wars at face value. But on the opposite the book is actually very severely attacking anything not in the line of the Cox-Mayo severe testing line. Mostly Bayesian approach(es) to the issue! For instance, Jim Berger’s construct of his reconciliation between Fisher, Neyman, and Jeffreys is surgically deconstructed over five pages and exposed as a Bayesian ploy. Similarly, the warnings from Dennis Lindley and other Bayesians that the p-value attached with the Higgs boson experiment are not probabilities that the particle does not exist are met with ridicule. (Another go at Jim’s Objective Bayes credentials is found in the squared myth of objectivity chapter. Maybe more strongly than against staunch subjectivists like Jay Kadane. And yet another go when criticising the Berger and Sellke 1987 lower bound results. Which even extends to Vale Johnson’s UMP-type Bayesian tests.)

“Inference should provide posterior probabilities, final degrees of support, belief, probability (…) not provided by Bayes factors.” (p.443)

Another subtitle of the book could have been testing in Flatland given the limited scope of the models considered with one or at best two parameters and almost always a Normal setting. I have no idea whatsoever how the severity principle would apply in more complex models, with e.g. numerous nuisance parameters. By sticking to the simplest possible models, the book can carry on with the optimality concepts of the early days, like sufficiency (p.147) and and monotonicity and uniformly most powerful procedures, which only make sense in a tiny universe.

“The estimate is really a hypothesis about the value of the parameter.  The same data warrant the hypothesis constructed!” (p.92)

There is an entire section on the lack of difference between confidence intervals and the dual acceptance regions, although the lack of unicity in defining either of them should come as a bother. Especially outside Flatland. Actually the following section, from p.193 onward, reminds me of fiducial arguments, the more because Schweder and Hjort are cited there. (With a curve like Fig. 3.3. operating like a cdf on the parameter μ but no dominating measure!)

“The Fisher-Neyman dispute is pathological: there’s no disinterring the truth of the matter (…) Fisher grew to renounce performance goals he himself had held when it was found that fiducial solutions disagreed with them.”(p.390)

Similarly the chapter on the “myth of the “the myth of objectivity””(p.221) is mostly and predictably targeting Bayesian arguments. The dismissal of Frank Lad’s arguments for subjectivity ends up [or down] with a rather cheap that it “may actually reflect their inability to do the math” (p.228). [CoI: I once enjoyed a fantastic dinner cooked by Frank in Christchurch!] And the dismissal of loss function requirements in Ziliak and McCloskey is similarly terse, if reminding me of Aris Spanos’ own arguments against decision theory. (And the arguments about the Jeffreys-Lindley paradox as well.)

“It’s not clear how much of the current Bayesian revolution is obviously Bayesian.” (p.405)

The section (Tour IV) on model uncertainty (or against “all models are wrong”) is somewhat limited in that it is unclear what constitutes an adequate (if wrong) model. And calling for the CLT cavalry as backup (p.299) is not particularly convincing.

It is not that everything is controversial in SIST (!) and I found agreement in many (isolated) statements. Especially in the early chapters. Another interesting point made in the book is to question whether or not the likelihood principle at all makes sense within a testing setting. When two models (rather than a point null hypothesis) are X-examined, it is a rare occurrence that the likelihood factorises any further than the invariance by permutation of iid observations. Which reminded me of our earlier warning on the dangers of running ABC for model choice based on (model specific) sufficient statistics. Plus a nice sprinkling of historical anecdotes, esp. about Neyman’s life, from Poland, to Britain, to California, with some time in Paris to attend Borel’s and Lebesgue’s lectures. Which is used as a background for a play involving Bertrand, Borel, Neyman and (Egon) Pearson. Under the title “Les Miserables Citations” [pardon my French but it should be Les Misérables if Hugo is involved! Or maybe les gilets jaunes…] I also enjoyed the sections on reuniting Neyman-Pearson with Fisher, while appreciating that Deborah Mayo wants to stay away from the “minefields” of fiducial inference. With, mot interestingly, Neyman himself trying in 1956 to convince Fisher of the fallacy of the duality between frequentist and fiducial statements (p.390). Wisely quoting Nancy Reid at BFF4 stating the unclear state of affair on confidence distributions. And the final pages reawakened an impression I had at an earlier stage of the book, namely that the ABC interpretation on Bayesian inference in Rubin (1984) could come closer to Deborah Mayo’s quest for comparative inference (p.441) than she thinks, in that producing parameters producing pseudo-observations agreeing with the actual observations is an “ability to test accordance with a single model or hypothesis”.

“Although most Bayesians these days disavow classic subjective Bayesian foundations, even the most hard-nosed. “we’re not squishy” Bayesian retain the view that a prior distribution is an important if not the best way to bring in background information.” (p.413)

A special mention to Einstein’s cafe (p.156), which reminded me of this picture of Einstein’s relative Cafe I took while staying in Melbourne in 2016… (Not to be confused with the Markov bar in the same city.) And a fairly minor concern that I find myself quoted in the sections priors: a gallimaufry (!) and… Bad faith Bayesianism (!!), with the above qualification. Although I later reappear as a pragmatic Bayesian (p.428), although a priori as a counter-example!

## Measuring statistical evidence using relative belief [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on July 22, 2015 by xi'an

“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8)

This new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian  principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it). [Warning: this is a rather long review and post, so readers may chose to opt out now!]

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

## maximum likelihood: an introduction

Posted in Books, Statistics with tags , , , , on December 20, 2014 by xi'an

“Basic Principle 0. Do not trust any principle.” L. Le Cam (1990)

Here is the abstract of a International Statistical Rewiew 1990 paper by Lucien Le Cam on maximum likelihood. ISR keeping a tradition of including an abstract in French for every paper, Le Cam (most presumably) wrote his own translation [or maybe wrote the French version first], which sounds much funnier to me and so I cannot resist posting both, pardon my/his French! [I just find “Ce fait” rather unusual, as I would have rather written “Ceci fait”…]:

Maximum likelihood estimates are reported to be best under all circumstances. Yet there are numerous simple examples where they plainly misbehave. One gives some examples for problems that had not been invented for the purpose of annoying maximum likelihood fans. Another example, imitated from Bahadur, has been specially created with just such a purpose in mind. Next, we present a list of principles leading to the construction of good estimates. The main principle says that one should not believe in principles but study each problem for its own sake.

L’auteur a ouï dire que la méthode du maximum de vraisemblance est la meilleure méthode d’estimation. C’est bien vrai, et pourtant la méthode se casse le nez sur des exemples bien simples qui n’avaient pas été inventés pour le plaisir de montrer que la méthode peut être très désagréable. On en donne quelques-uns, plus un autre, imité de Bahadur et fabriqué exprès pour ennuyer les admirateurs du maximum de vraisemblance. Ce fait, on donne une savante liste de principes de construction de bons estimateurs, le principe principal étant qu’il ne faut pas croire aux principes.

The entire paper is just as witty, as in describing the mixture model as “contaminated and not fit to drink”! Or in “Everybody knows that taking logarithms is unfair”. Or, again, in “biostatisticians, being complicated people, prefer to work out not with the dose y but with its logarithm”… And a last line: “One possibility is that there are too many horse hairs in e”.

## Deborah Mayo’s talk in Montréal (JSM 2013)

Posted in Books, Statistics, Uncategorized with tags , , , , , , on July 31, 2013 by xi'an

As posted on her blog, Deborah Mayo is giving a lecture at JSM 2013 in Montréal about why Birnbaum’s derivation of the Strong Likelihood Principle (SLP) is wrong. Or, more accurately, why “WCP entails SLP”. It would have been a great opportunity to hear Deborah presenting her case and I am sorry I am missing this opportunity. (Although not sorry to be in the beautiful Dolomites at that time.) Here are the slides:

Deborah’s argument is the same as previously: there is no reason for the inference in the mixed (or Birnbaumized) experiment to be equal to the inference in the conditional experiment. As previously, I do not get it: the weak conditionality principle (WCP) implies that inference from the mixture output, once we know which component is used (hence rejecting the “and we don’t know which” on slide 8), should only be dependent on that component. I also fail to understand why either WCP or the Birnbaum experiment refers to a mixture (sl.13) in that the index of the experiment is assumed to be known, contrary to mixtures. Thus (still referring at slide 13), the presentation of Birnbaum’s experiment is erroneous. It is indeed impossible to force the outcome of y* if tail and of x* if head but it is possible to choose the experiment index at random, 1 versus 2, and then, if y* is observed, to report (E1,x*) as a sufficient statistic. (Incidentally, there is a typo on slide 15, it should be “likewise for x*”.)

## Birnbaum’s proof missing one bar?!

Posted in Statistics with tags , , , , on March 4, 2013 by xi'an

Michael Evans just posted a new paper on arXiv yesterday about Birnbaum’s proof of his likelihood principle theorem. There has recently been a lot of activity around this theorem (some of which reported on the ‘Og!) and the flurry of proofs, disproofs, arguments, counterarguments, and counter-counterarguments, mostly by major figures in the field, is rather overwhelming! This paper  is however highly readable as it sets everything in terms of set theory and relations. While I am not completely convinced that the conclusion holds, the steps in the paper seem correct. The starting point is that the likelihood relation, L, the invariance relation, G, and the sufficiency relation, S, all are equivalence relations (on the set of inference bases/parametric families). The conditionality relation,C, however fails to be transitive and hence an equivalence relation. Furthermore, the smallest equivalence relation containing the conditionality relation is the likelihood relation. Then Evans proves that the conjunction of the sufficiency and the conditionality relations is strictly included in the likelihood relation, which is the smallest equivalence relation containing the union. Furthermore, the fact that the smallest equivalence relation containing the conditionality relation is the likelihood relation means that sufficiency is irrelevant (in this sense, and in this sense only!).

This is a highly interesting and well-written document. I just do not know what to think of it in correspondence with my understanding of the likelihood principle. That

$\overline{S \cup C} = L$

rather than

$S \cup C =L$

makes a difference from a mathematical point of view, however I cannot relate it to the statistical interpretation. Like, why would we have to insist upon equivalence? why does invariance appear in some lemmas? why is a maximal ancillary statistics relevant at this stage when it does not appear in the original proof of Birbaum (1962)? why is there no mention made of weak versus strong conditionality principle?

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on February 20, 2013 by xi'an

True randomness was the topic of the `Random numbers; fifty years later’ talk in DESY by Frederick James from CERN. I had discussed a while ago a puzzling book related to this topic. This talk went along a rather different route, focussing on random generators. James put this claim that there are computer based physical generators that are truly random. (He had this assertion that statisticians do not understand randomness because they do not know quantum mechanics.) He distinguished those from pseudo-random generators: “nobody understood why they were (almost) random”, “IBM did not know how to generate random numbers”… But then spent the whole talk discussing those pseudo-random generators. Among other pieces of trivia, James mentioned that George Marsaglia was the one exhibiting the hyperplane features of congruential generators. That Knuth achieved no successful definition of what randomness is in his otherwise wonderful books! James thus introduced Kolmogorov’s mixing (not Kolmogorov’s complexity, mind you!) as advocated by Soviet physicists to underlie randomness. Not producing anything useful for RNGs in the 60’s. He then moved to the famous paper by Ferrenberg, Landau and Wong (1992) that I remember reading more or less at the time. In connection with the phase transition critical slowing down phenomena in Ising model simulations. And connecting with the Wang-Landau algorithm of flipping many sites at once (which exhibited long-term dependences in the generators). Most interestingly, a central character in this story is Martin Lüscher, based in DESY, who expressed the standard generator of the time RCARRY into one studied by those Soviet mathematicians,

X’=AX

showing that it enjoyed Kolmogorov mixing, but with a very poor Lyapunov coefficient. I partly lost track there as RCARRY was not perfect. And on how this Kolmogorov mixing would relate to long-term dependencies. One explanation by James was that this property is only asymptotic. (I would even say statistical!) Also interestingly, the 1994 paper by Lüscher produces the number of steps necessary to attain complete mixing, namely 15 steps, which thus works as a cutoff point. (I wonder why a 15-step RCARRY is slower, since A15 can be computed at once… It may be due to the fact that A is sparse while A15 is not.) James mentioned that Marsaglia’s Die Hard battery of tests is now obsolete and superseded by Pierre Lecuyer’s TestU01.

In conclusion, I did very much like this presentation from an insider, but still do not feel it makes a contribution to the debate on randomness, as it stayed put on pseudorandom generators. To keep the connection with von Neumann, they all produce wrong answers from a randomness point of view, if not from a statistical one. (A final quote from the talk: “Among statisticians and number theorists who are supposed to be specialists, they do not know about Kolmogorov mixing.”) [Discussing with Fred James at the reception after the talk was obviously extremely pleasant, as he happened to know a lot of my Bayesian acquaintances!]