## about the strong likelihood principle

Posted in Books, Statistics, University life with tags , , , , , , , on November 13, 2014 by xi'an

Deborah Mayo arXived a Statistical Science paper a few days ago, along with discussions by Jan Bjørnstad, Phil Dawid, Don Fraser, Michael Evans, Jan Hanning, R. Martin and C. Liu. I am very glad that this discussion paper came out and that it came out in Statistical Science, although I am rather surprised to find no discussion by Jim Berger or Robert Wolpert, and even though I still cannot entirely follow the deductive argument in the rejection of Birnbaum’s proof, just as in the earlier version in Error & Inference.  But I somehow do not feel like going again into a new debate about this critique of Birnbaum’s derivation. (Even though statements like the fact that the SLP “would preclude the use of sampling distributions” (p.227) would call for contradiction.)

“It is the imprecision in Birnbaum’s formulation that leads to a faulty impression of exactly what  is proved.” M. Evans

Indeed, at this stage, I fear that [for me] a more relevant issue is whether or not the debate does matter… At a logical cum foundational [and maybe cum historical] level, it makes perfect sense to uncover if and which if any of the myriad of Birnbaum’s likelihood Principles holds. [Although trying to uncover Birnbaum’s motives and positions over time may not be so relevant.] I think the paper and the discussions acknowledge that some version of the weak conditionality Principle does not imply some version of the strong likelihood Principle. With other logical implications remaining true. At a methodological level, I am less much less sure it matters. Each time I taught this notion, I got blank stares and incomprehension from my students, to the point I have now stopped altogether teaching the likelihood Principle in class. And most of my co-authors do not seem to care very much about it. At a purely mathematical level, I wonder if there even is ground for a debate since the notions involved can be defined in various imprecise ways, as pointed out by Michael Evans above and in his discussion. At a statistical level, sufficiency eventually is a strange notion in that it seems to make plenty of sense until one realises there is no interesting sufficiency outside exponential families. Just as there are very few parameter transforms for which unbiased estimators can be found. So I also spend very little time teaching and even less worrying about sufficiency. (As it happens, I taught the notion this morning!) At another and presumably more significant statistical level, what matters is information, e.g., conditioning means adding information (i.e., about which experiment has been used). While complex settings may prohibit the use of the entire information provided by the data, at a formal level there is no argument for not using the entire information, i.e. conditioning upon the entire data. (At a computational level, this is no longer true, witness ABC and similar limited information techniques. By the way, ABC demonstrates if needed why sampling distributions matter so much to Bayesian analysis.)

“Non-subjective Bayesians who (…) have to live with some violations of the likelihood principle (…) since their prior probability distributions are influenced by the sampling distribution.” D. Mayo (p.229)

In the end, the fact that the prior may depend on the form of the sampling distribution and hence does violate the likelihood Principle does not worry me so much. In most models I consider, the parameters are endogenous to those sampling distributions and do not live an ethereal existence independently from the model: they are substantiated and calibrated by the model itself, which makes the discussion about the LP rather vacuous. See, e.g., the coefficients of a linear model. In complex models, or in large datasets, it is even impossible to handle the whole data or the whole model and proxies have to be used instead, making worries about the structure of the (original) likelihood vacuous. I think we have now reached a stage of statistical inference where models are no longer accepted as ideal truth and where approximation is the hard reality, imposed by the massive amounts of data relentlessly calling for immediate processing. Hence, where the self-validation or invalidation of such approximations in terms of predictive performances is the relevant issue. Provided we can at all face the challenge…

## Deborah Mayo’s talk in Montréal (JSM 2013)

Posted in Books, Statistics, Uncategorized with tags , , , , , , on July 31, 2013 by xi'an

As posted on her blog, Deborah Mayo is giving a lecture at JSM 2013 in Montréal about why Birnbaum’s derivation of the Strong Likelihood Principle (SLP) is wrong. Or, more accurately, why “WCP entails SLP”. It would have been a great opportunity to hear Deborah presenting her case and I am sorry I am missing this opportunity. (Although not sorry to be in the beautiful Dolomites at that time.) Here are the slides:

Deborah’s argument is the same as previously: there is no reason for the inference in the mixed (or Birnbaumized) experiment to be equal to the inference in the conditional experiment. As previously, I do not get it: the weak conditionality principle (WCP) implies that inference from the mixture output, once we know which component is used (hence rejecting the “and we don’t know which” on slide 8), should only be dependent on that component. I also fail to understand why either WCP or the Birnbaum experiment refers to a mixture (sl.13) in that the index of the experiment is assumed to be known, contrary to mixtures. Thus (still referring at slide 13), the presentation of Birnbaum’s experiment is erroneous. It is indeed impossible to force the outcome of y* if tail and of x* if head but it is possible to choose the experiment index at random, 1 versus 2, and then, if y* is observed, to report (E1,x*) as a sufficient statistic. (Incidentally, there is a typo on slide 15, it should be “likewise for x*”.)

## Birnbaum’s proof missing one bar?!

Posted in Statistics with tags , , , , on March 4, 2013 by xi'an

Michael Evans just posted a new paper on arXiv yesterday about Birnbaum’s proof of his likelihood principle theorem. There has recently been a lot of activity around this theorem (some of which reported on the ‘Og!) and the flurry of proofs, disproofs, arguments, counterarguments, and counter-counterarguments, mostly by major figures in the field, is rather overwhelming! This paper  is however highly readable as it sets everything in terms of set theory and relations. While I am not completely convinced that the conclusion holds, the steps in the paper seem correct. The starting point is that the likelihood relation, L, the invariance relation, G, and the sufficiency relation, S, all are equivalence relations (on the set of inference bases/parametric families). The conditionality relation,C, however fails to be transitive and hence an equivalence relation. Furthermore, the smallest equivalence relation containing the conditionality relation is the likelihood relation. Then Evans proves that the conjunction of the sufficiency and the conditionality relations is strictly included in the likelihood relation, which is the smallest equivalence relation containing the union. Furthermore, the fact that the smallest equivalence relation containing the conditionality relation is the likelihood relation means that sufficiency is irrelevant (in this sense, and in this sense only!).

This is a highly interesting and well-written document. I just do not know what to think of it in correspondence with my understanding of the likelihood principle. That

$\overline{S \cup C} = L$

rather than

$S \cup C =L$

makes a difference from a mathematical point of view, however I cannot relate it to the statistical interpretation. Like, why would we have to insist upon equivalence? why does invariance appear in some lemmas? why is a maximal ancillary statistics relevant at this stage when it does not appear in the original proof of Birbaum (1962)? why is there no mention made of weak versus strong conditionality principle?

Posted in Statistics with tags , , , , , , , , , on January 28, 2013 by xi'an

Last Monday, my student Li Chenlu presented the foundational 1962 JASA paper by Allan Birnbaum, On the Foundations of Statistical Inference. The very paper that derives the Likelihood Principle from the cumulated Conditional and Sufficiency principles and that had been discussed [maybe ad nauseam] on this ‘Og!!! Alas, thrice alas!, I was still stuck in the plane flying back from Atlanta as she was presenting her understanding of the paper, as the flight had been delayed four hours thanks to (or rather woe to!) the weather conditions in Paris the day before (chain reaction…):

I am sorry I could not attend this lecture and this for many reasons: first and  foremost, I wanted to attend every talk from my students both out of respect for them and to draw a comparison between their performances. My PhD student Sofia ran the seminar that day in my stead, for which I am quite grateful, but I do do wish I had been there… Second, this a.s. has been the most philosophical paper in the series.and I would have appreciated giving the proper light on the reasons for and the consequences of this paper as Li Chenlu stuck very much on the paper itself. (She provided additional references in the conclusion but they did not seem to impact the slides.)  Discussing for instance Berger’s and Wolpert’s (1988) new lights on the topic, as well as Deborah Mayo‘s (2010) attacks, and even Chang‘s (2012) misunderstandings, would have clearly helped the students.

## That the likelihood principle does not hold…

Posted in Statistics, University life with tags , , , , , , , , , , on October 6, 2011 by xi'an

Coming to Section III in Chapter Seven of Error and Inference, written by Deborah Mayo, I discovered that she considers that the likelihood principle does not hold (at least as a logical consequence of the combination of the sufficiency and of the conditionality principles), thus that  Allan Birnbaum was wrong…. As well as the dozens of people working on the likelihood principle after him! Including Jim Berger and Robert Wolpert [whose book sells for \$214 on amazon!, I hope the authors get a hefty chunk of that ripper!!! Esp. when it is available for free on project Euclid…] I had not heard of  (nor seen) this argument previously, even though it has apparently created enough of a bit of a stir around the likelihood principle page on Wikipedia. It does not seem the result is published anywhere but in the book, and I doubt it would get past a review process in a statistics journal. [Judging from a serious conversation in Zürich this morning, I may however be wrong!]

The core of Birnbaum’s proof is relatively simple: given two experiments and about the same parameter θ with different sampling distributions and , such that there exists a pair of outcomes (y¹,y²) from those experiments with proportional likelihoods, i.e. as a function of θ

$f^1(y^1|\theta) = c f^2(y^2|\theta),$

one considers the mixture experiment E⁰ where  and are each chosen with probability ½. Then it is possible to build a sufficient statistic T that is equal to the data (j,x), except when j=2 and x=y², in which case T(j,x)=(1,y¹). This statistic is sufficient since the distribution of (j,x) given T(j,x) is either a Dirac mass or a distribution on {(1,y¹),(2,y²)} that only depends on c. Thus it does not depend on the parameter θ. According to the weak conditionality principle, statistical evidence, meaning the whole range of inferences possible on θ and being denoted by Ev(E,z), should satisfy

$Ev(E^0, (j,x)) = Ev(E^j,x)$

Because the sufficiency principle states that

$Ev(E^0, (j,x)) = Ev(E^0,T(j,x))$

this leads to the likelihood principle

$Ev(E^1,y^1)=Ev(E^0, (j,y^j)) = Ev(E^2,y^2)$

(See, e.g., The Bayesian Choice, pp. 18-29.) Now, Mayo argues this is wrong because

“The inference from the outcome (Ej,yj) computed using the sampling distribution of [the mixed experiment] E⁰ is appropriately identified with an inference from outcome yj based on the sampling distribution of Ej, which is clearly false.” (p.310)

This sounds to me like a direct rejection of the conditionality principle, so I do not understand the point. (A formal rendering in Section 5 using the logic formalism of A’s and Not-A’s reinforces my feeling that the conditionality principle is the one criticised and misunderstood.) If Mayo’s frequentist stance leads her to take the sampling distribution into account at all times, this is fine within her framework. But I do not see how this argument contributes to invalidate Birnbaum’s proof. The following and last sentence of the argument may bring some light on the reason why Mayo considers it does:

“The sampling distribution to arrive at Ev(E⁰,(j,yj)) would be the convex combination averaged over the two ways that yj could have occurred. This differs from the  sampling distributions of both Ev(E1,y1) and Ev(E2,y2).” (p.310)

Indeed, and rather obviously, the sampling distribution of the evidence Ev(E*,z*) will differ depending on the experiment. But this is not what is stated by the likelihood principle, which is that the inference itself should be the same for and . Not the distribution of this inference. This confusion between inference and its assessment is reproduced in the “Explicit Counterexample” section, where p-values are computed and found to differ for various conditional versions of a mixed experiment. Again, not a reason for invalidating the likelihood principle. So, in the end, I remain fully unconvinced by this demonstration that Birnbaum was wrong. (If in a bystander’s agreement with the fact that frequentist inference can be built conditional on ancillary statistics.)