## Deborah Mayo’s talk in Montréal (JSM 2013)

Posted in Books, Statistics, Uncategorized with tags , , , , , , on July 31, 2013 by xi'an

As posted on her blog, Deborah Mayo is giving a lecture at JSM 2013 in Montréal about why Birnbaum’s derivation of the Strong Likelihood Principle (SLP) is wrong. Or, more accurately, why “WCP entails SLP”. It would have been a great opportunity to hear Deborah presenting her case and I am sorry I am missing this opportunity. (Although not sorry to be in the beautiful Dolomites at that time.) Here are the slides:

Deborah’s argument is the same as previously: there is no reason for the inference in the mixed (or Birnbaumized) experiment to be equal to the inference in the conditional experiment. As previously, I do not get it: the weak conditionality principle (WCP) implies that inference from the mixture output, once we know which component is used (hence rejecting the “and we don’t know which” on slide 8), should only be dependent on that component. I also fail to understand why either WCP or the Birnbaum experiment refers to a mixture (sl.13) in that the index of the experiment is assumed to be known, contrary to mixtures. Thus (still referring at slide 13), the presentation of Birnbaum’s experiment is erroneous. It is indeed impossible to force the outcome of y* if tail and of x* if head but it is possible to choose the experiment index at random, 1 versus 2, and then, if y* is observed, to report (E1,x*) as a sufficient statistic. (Incidentally, there is a typo on slide 15, it should be “likewise for x*”.)

## who’s afraid of the big B wolf?

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 13, 2013 by xi'an

Aris Spanos just published a paper entitled “Who should be afraid of the Jeffreys-Lindley paradox?” in the journal Philosophy of Science. This piece is a continuation of the debate about frequentist versus llikelihoodist versus Bayesian (should it be Bayesianist?! or Laplacist?!) testing approaches, exposed in Mayo and Spanos’ Error and Inference, and discussed in several posts of the ‘Og. I started reading the paper in conjunction with a paper I am currently writing for a special volume in  honour of Dennis Lindley, paper that I will discuss later on the ‘Og…

“…the postdata severity evaluation (…) addresses the key problem with Fisherian p-values in the sense that the severity evaluation provides the “magnitude” of the warranted discrepancy from the null by taking into account the generic capacity of the test (that includes n) in question as it relates to the observed data”(p.88)

First, the antagonistic style of the paper is reminding me of Spanos’ previous works in that it relies on repeated value judgements (such as “Bayesian charge”, “blatant misinterpretation”, “Bayesian allegations that have undermined the credibility of frequentist statistics”, “both approaches are far from immune to fallacious interpretations”, “only crude rules of thumbs”, &tc.) and rhetorical sleights of hand. (See, e.g., “In contrast, the severity account ensures learning from data by employing trustworthy evidence (…), the reliability of evidence being calibrated in terms of the relevant error probabilities” [my stress].) Connectedly, Spanos often resorts to an unusual [at least for statisticians] vocabulary that amounts to newspeak. Here are some illustrations: “summoning the generic capacity of the test”, ‘substantively significant”, “custom tailoring the generic capacity of the test”, “the fallacy of acceptance”, “the relevance of the generic capacity of the particular test”, yes the term “generic capacity” is occurring there with a truly high frequency. Continue reading

## Birnbaum’s proof missing one bar?!

Posted in Statistics with tags , , , , on March 4, 2013 by xi'an

Michael Evans just posted a new paper on arXiv yesterday about Birnbaum’s proof of his likelihood principle theorem. There has recently been a lot of activity around this theorem (some of which reported on the ‘Og!) and the flurry of proofs, disproofs, arguments, counterarguments, and counter-counterarguments, mostly by major figures in the field, is rather overwhelming! This paper  is however highly readable as it sets everything in terms of set theory and relations. While I am not completely convinced that the conclusion holds, the steps in the paper seem correct. The starting point is that the likelihood relation, L, the invariance relation, G, and the sufficiency relation, S, all are equivalence relations (on the set of inference bases/parametric families). The conditionality relation,C, however fails to be transitive and hence an equivalence relation. Furthermore, the smallest equivalence relation containing the conditionality relation is the likelihood relation. Then Evans proves that the conjunction of the sufficiency and the conditionality relations is strictly included in the likelihood relation, which is the smallest equivalence relation containing the union. Furthermore, the fact that the smallest equivalence relation containing the conditionality relation is the likelihood relation means that sufficiency is irrelevant (in this sense, and in this sense only!).

This is a highly interesting and well-written document. I just do not know what to think of it in correspondence with my understanding of the likelihood principle. That

$\overline{S \cup C} = L$

rather than

$S \cup C =L$

makes a difference from a mathematical point of view, however I cannot relate it to the statistical interpretation. Like, why would we have to insist upon equivalence? why does invariance appear in some lemmas? why is a maximal ancillary statistics relevant at this stage when it does not appear in the original proof of Birbaum (1962)? why is there no mention made of weak versus strong conditionality principle?

Posted in Statistics with tags , , , , , , , , , on January 28, 2013 by xi'an

Last Monday, my student Li Chenlu presented the foundational 1962 JASA paper by Allan Birnbaum, On the Foundations of Statistical Inference. The very paper that derives the Likelihood Principle from the cumulated Conditional and Sufficiency principles and that had been discussed [maybe ad nauseam] on this ‘Og!!! Alas, thrice alas!, I was still stuck in the plane flying back from Atlanta as she was presenting her understanding of the paper, as the flight had been delayed four hours thanks to (or rather woe to!) the weather conditions in Paris the day before (chain reaction…):

I am sorry I could not attend this lecture and this for many reasons: first and  foremost, I wanted to attend every talk from my students both out of respect for them and to draw a comparison between their performances. My PhD student Sofia ran the seminar that day in my stead, for which I am quite grateful, but I do do wish I had been there… Second, this a.s. has been the most philosophical paper in the series.and I would have appreciated giving the proper light on the reasons for and the consequences of this paper as Li Chenlu stuck very much on the paper itself. (She provided additional references in the conclusion but they did not seem to impact the slides.)  Discussing for instance Berger’s and Wolpert’s (1988) new lights on the topic, as well as Deborah Mayo‘s (2010) attacks, and even Chang‘s (2012) misunderstandings, would have clearly helped the students.

## the likelihood principle (sequel)

Posted in Statistics with tags , , , , , on November 30, 2012 by xi'an

As mentioned in my review of Paradoxes in Scientific Inference I was a bit confused by this presentation of the likelihood principle and this led me to ponder for a week or so whether or not there was an issue with Birnbaum’s proof (or, much more likely, with my vision of it!). After reading again Birnbaum’s proof, while sitting down in a quiet room at ICERM for a little while, I do not see any reason to doubt it. (Keep reading at your own risk!)

My confusion was caused by mixing sufficiency in the sense of Birnbaum’s mixed experiment with sufficiency in the sense of our ABC model choice PNAS paper, namely that sufficient statistics are not always sufficient to select the right model. The sufficient statistics in the proof reduces the (2,x2) observation from Model 2 to (1,x1) from Model 1 when there is an observation x1 that produces a likelihood proportional to the likelihood for x2 and the statistic is indeed sufficient: the distribution of (2,x2) given (1,x1) does not depend on the parameter θ. Of course, the statistic is not sufficient (most of the time) for deciding between Model 1 and Model 2, but this model choice issue is foreign to Birnbaum’s construction.

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on November 23, 2012 by xi'an

This CRC Press book was sent to me for review in CHANCE: Paradoxes in Scientific Inference is written by Mark Chang, vice-president of AMAG Pharmaceuticals. The topic of scientific paradoxes is one of my primary interests and I have learned a lot by looking at Lindley-Jeffreys and Savage-Dickey paradoxes. However, I did not find a renewed sense of excitement when reading the book. The very first (and maybe the best!) paradox with Paradoxes in Scientific Inference is that it is a book from the future! Indeed, its copyright year is 2013 (!), although I got it a few months ago. (Not mentioning here the cover mimicking Escher’s “paradoxical” pictures with dices. A sculpture due to Shigeo Fukuda and apparently not quoted in the book. As I do not want to get into another dice cover polemic, I will abstain from further comments!)

Now, getting into a deeper level of criticism (!), I find the book very uneven and overall quite disappointing. (Even missing in its statistical foundations.) Esp. given my initial level of excitement about the topic!

When the null hypothesis is rejected, the p-value is the probability of the type I error.Paradoxes in Scientific Inference (p.105)

The p-value is the conditional probability given H0.” Paradoxes in Scientific Inference (p.106)

Second, the depth of the statistical analysis in the book is often found missing. For instance, Simpson’s paradox is not analysed from a statistical perspective, only reported as a fact. Sticking to statistics, take for instance the discussion of Lindley’s paradox. The author seems to think that the problem is with the different conclusions produced by the frequentist, likelihood, and Bayesian analyses (p.122). This is completely wrong: Lindley’s (or Lindley-Jeffreys‘s) paradox is about the lack of significance of Bayes factors based on improper priors. Similarly, when the likelihood ratio test is introduced, the reference threshold is given as equal to 1 and no mention is later made of compensating for different degrees of freedom/against over-fitting. The discussion about p-values is equally garbled, witness the above quote which (a) conditions upon the rejection and (b) ignores the dependence of the p-value on a realized random variable. Continue reading

## Error and Inference [on wrong models]

Posted in Books, Statistics, University life with tags , , , , , , on December 6, 2011 by xi'an

In connection with my series of posts on the book Error and Inference, and my recent collation of those into an arXiv document, Deborah Mayo has started a series of informal seminars at the LSE on the philosophy of errors in statistics and the likelihood principle. and has also posted a long comment on my argument about only using wrong models. (The title is inspired from the Rolling Stones’ “You can’t always get what you want“, very cool!) The discussion about the need or not to take into account all possible models (which is the meaning of the “catchall hypothesis” I had missed while reading the book) shows my point was not clear. I obviously do not claim in the review that all possible models should be accounted for at once, this was on the opposite my understanding of Mayo’s criticism of the Bayesian approach (I thought the following sentence was clear enough: “According to Mayo, this alternative hypothesis should “include all possible rivals, including those not even though of” (p.37)”)! So I see the Bayesian approach as a way to put on the table a collection of reasonable (if all wrong) models and give to those models a posterior probability, with the purpose that improbable ones are eliminated. Therefore, I am in agreement with most of the comments in the post, esp. because this has little to do with Bayesian versus frequentist testing! Even rejecting the less likely models from a collection seems compatible with a Bayesian approach, model averaging is not always an appropriate solution, depending on the loss function!