## about the strong likelihood principle

**D**eborah Mayo arXived a Statistical Science paper a few days ago, along with discussions by Jan Bjørnstad, Phil Dawid, Don Fraser, Michael Evans, Jan Hanning, R. Martin and C. Liu. I am very glad that this discussion paper came out and that it came out in Statistical Science, although I am rather surprised to find no discussion by Jim Berger or Robert Wolpert, and even though I still cannot entirely follow the deductive argument in the rejection of Birnbaum’s proof, just as in the earlier version in Error & Inference. But I somehow do not feel like going again into a new debate about this critique of Birnbaum’s derivation. (Even though statements like the fact that the SLP “would preclude the use of sampling distributions” (p.227) would call for contradiction.)

“It is the imprecision in Birnbaum’s formulation that leads to a faulty impression of exactly what is proved.” M. Evans

Indeed, at this stage, I fear that [for me] a more relevant issue is whether or not the debate does matter… At a logical cum foundational [and maybe cum historical] level, it makes perfect sense to uncover if and which if any of the myriad of Birnbaum’s likelihood Principles holds. [Although trying to uncover Birnbaum’s motives and positions over time may not be so relevant.] I think the paper and the discussions acknowledge that *some* version of the weak conditionality Principle does not imply *some* version of the strong likelihood Principle. With other logical implications remaining true. At a methodological level, I am less much less sure it matters. Each time I taught this notion, I got blank stares and incomprehension from my students, to the point I have now stopped altogether teaching the likelihood Principle in class. And most of my co-authors do not seem to care very much about it. At a purely mathematical level, I wonder if there even is ground for a debate since the notions involved can be defined in various imprecise ways, as pointed out by Michael Evans above and in his discussion. At a statistical level, sufficiency eventually is a strange notion in that it seems to make plenty of sense until one realises there is no interesting sufficiency outside exponential families. Just as there are very few parameter transforms for which unbiased estimators can be found. So I also spend very little time teaching and even less worrying about sufficiency. (As it happens, I taught the notion this morning!) At another and presumably more significant statistical level, what matters is information, e.g., conditioning means adding information (i.e., about which experiment has been used). While complex settings may prohibit the use of the entire information provided by the data, at a formal level there is no argument for not using the entire information, i.e. conditioning upon the entire data. (At a computational level, this is no longer true, witness ABC and similar limited information techniques. By the way, ABC demonstrates if needed why sampling distributions matter so much to Bayesian analysis.)

“Non-subjective Bayesians who (…) have to live with some violations of the likelihood principle (…) since their prior probability distributions are influenced by the sampling distribution.” D. Mayo (p.229)

In the end, the fact that the prior may depend on the form of the sampling distribution and hence does violate the likelihood Principle does not worry me so much. In most models I consider, the parameters are endogenous to those sampling distributions and do not live an ethereal existence independently from the model: they are substantiated and calibrated by the model itself, which makes the discussion about the LP rather vacuous. See, e.g., the coefficients of a linear model. In complex models, or in large datasets, it is even impossible to handle the whole data or the whole model and proxies have to be used instead, making worries about the structure of the (original) likelihood vacuous. I think we have now reached a stage of statistical inference where models are no longer accepted as ideal truth and where approximation is the hard reality, imposed by the massive amounts of data relentlessly calling for immediate processing. Hence, where the self-validation or invalidation of such approximations in terms of predictive performances is the relevant issue. Provided we can at all face the challenge…

November 27, 2014 at 12:04 am

I couldn’t reply under “tree entt” but to his remark that

“Mayo’s argument amounts to “I like p-values, therefore they are a counter example to Birnbaum’s proof””

I say that he has clearly not bothered to understand the use of “counterexample” that is relevant here. I certainly do not assume p-values or any sampling theory, and my use of the term “counterexample” (as I clearly explain) is very different from finding an application that violates the principle (that was a GIVEN in line one). Instead it is the logician’s usage. A counterexample to an alleged valid argument from A to B is a model M such that A and not-B are true under M. But I also go much further and show any attempt to save the argument is unsound. See also my response to Evans. The mistake in the Birnbaum argument is actually a quantifier error. If theorems in stat were stated in terms of the relevant quantifiers, it would have been spotted earlier. It’s rather disappointing to see this kind of belittling of the logical issue on this blog.

November 27, 2014 at 8:07 pm

I understood “counter example” in exactly the same sense. Having done research in mathematical logic with some of the best logicians alive, I understand the concept well. My comment still stands. I, like jaynes, never saw it as a “proof” in a strict mathematical sense for a variety of reasons (not all of which were given by you or Jaynes). I saw it as a rabbit hole of irrelevancies brought about by the desire of the mislead-n-misinformed to justify something already far clearer than the “assumptions” from which it was “derived”.

In the big picture I’ll stick to the sum/product rule and their implications like Bayes Theorem. The sum/product rule and mathematics derived from them trump philosopher’s intuitions every time. So much so that I refuse to listen to any philosophers of science who aren’t first rate mathematicians in their own right. They can neither see the implications of the mathematics themselves or understand the implications once explained.

If you wish to say the sum/product rule are only sometimes true –which is the central tenant of all rejections of Bayes—even though the mathematics indicates no such thing, then history will laugh at your foibles the way math students today laugh at the ancients who refused to accept negative solutions to equations because negative numbers weren’t philosophically sound or meaningful. It turns out, as it always does, that it was the philosophers that needed to be quietly forgotten, not the mathematician’s negative solutions.

P.s. Every well defined frequentist method that violates the likelihood principle in it’s range of applicability implied by the sum/product rules works on some narrow range of examples, but then gives horrendously absurd answers outside that range. The sum/product rule based Bayesian methods work just as well inside or outside that range.

That’s the real takeaway from debates about the likelihood principle.

Now you can scream your denials until the sun explodes, but that won’t change a single number in a single equation of the mathematics used to back that claim up.

November 28, 2014 at 3:46 am

I wonder why someone who is not even remotely talking about the relevant issue at hand, and hasn’t bothered to check out the exchange in Stat Sci, would wish to get involved in commenting on it. And I am a mathematician, logician, and philosopher. It may be that you “never saw it as a “proof” in a strict mathematical sense” but loads of stat texts present it as a theorem. If you are right, they should revise those sections to explain, “oh and by the way, we didn’t really mean theorem and proof here”. That is why, in my rejoinder to Evans, I call for the students who have been asked to complete the “proof” to be given their past credit due. No need to respond.

November 16, 2014 at 6:28 am

Family feud among frequentists. Move on, nothing to see

November 16, 2014 at 3:11 pm

Christos, I would tend to disagree on that one. Frequentists are actually the least interested among all statistic families as to whether or not the likelihood principle holds.

November 16, 2014 at 3:16 pm

I have to say that my view on this is heavily influenced by Jaynes who did not see this to be an issue that Bayesians should concern themselves with. It is interesting that Mayo seems to think it is mostly a problem for frequentists (or at least this is what I took her twitter response to mean).

November 16, 2014 at 6:10 pm

You are both wrong: (1) the debate is between frequentists (error statisticians, sampling theorists) and those who find the import of the evidence in the likelihood (e.g., most Bayesians and I think all likelihoodists); and (2) frequentists care a great deal about the SLP. They’ve been intimidated into embracing accounts in which it holds for 50+ years, even if at a gut level, they felt there was something fishy about Birnbaum’s result. Please read, if you haven’t, my rejoinder. Note, too, the extent to which criticisms of P-values rest on assuming the import of the evidence is in the likelihoods, and denying sampling distributions are relevant post-data.

http://errorstatistics.com/2014/07/14/the-p-values-overstate-the-evidence-against-the-null-fallacy/

Berger and Sellke (1987) and like papers, you might say, are from long ago, but a survey immediately reveals that they are the crux of the criticisms in current papers, multiplied daily.

November 16, 2014 at 9:03 pm

Deborah: I accept the rebuke about being wrong about (some) frequentists also caring or being worried about the LP, albeit when Birnbaum (1962) wrote his JASA paper. I am less convinced about (a) frequentists caring nowadays, assuming we can find someone self-defined as a frequentist!; (b) the afore-said frequentists feeling “intimidated” towards a likelihood or a Bayesian stance; (c) Bayesian statistics denying the sampling distribution post-data. About (b): hen I teach about p-values [as little as possible], I do not invoke a Bayesian argument to criticise them and even less the LP [which my students do not understand anyway]. About (c): there is a recurrence of this argument in your papers and blogs, but this is not true: once again, if I can run ABC to produce Bayesian posterior samples, it is because the sampling distribution can produce as many replicas of the data as wished.

November 16, 2014 at 7:36 pm

Christian,

Mayo’s argument amounts to “I like p-values, therefore they are a counter example to Birnbaum’s proof”. In fairness, Birnbaum’s original proof might have been formulated “I like the likelihood principle, therefore it holds”. I don’t think you can expect more than that from vaguely formulated meta “demonstrations”.

This as Jaynes said leads to an “infinite regress of irrelevancies”. Bayesian’s can safely ignore the likelihood principle, sufficiency, ancillary and all the rest because as long as they stick to the sum/product rules they wont be lead astray. Either these ideas are consistent with the sum/product rules, and will fall out of the equations automatically, or they’re wrong.

If Frequentists want to use methods which contradict the product rule, then they’ll continue to look the fool: every method they’ve ever proposed that does so leads to absurdities eventually and has to be ad-hoc patched up to save embarrassment. I confidently predict this will continue to hold true until the sun goes supernova. Bayesian methods, on the other hand, which stick to the sum/product rule always seem to have a host of amazing provable optimality and performance properties which no one suspected at first.

Their belief that Bayes Theorem is only sometimes meaningful –even though the mathematics indicates no such limitation–is rather like the belief mathematicians once had that negative solutions to algebraic equations aren’t valid solutions and shouldn’t be admitted as such. In both cases, people rejected correct mathematical solutions simply because they couldn’t understand them philosophically.

That’s the true foundation of their statistical philosophy: “I’d rather use wrong equations I understand, than true ones I don’t understand”. Bayesians are on infinitely firmer ground with the sum/product rule.

The idea that philosophy doesn’t matter though couldn’t be more wrong. Sometimes, finding the “sum/product rule” based Bayesian method takes work. In my experience, that work always pays off because it always uncovers important and useful truths which no one’s intuition had seen initially. But since it often does take work, only those who really understand Bayes and take it seriously will do it. Pragmatists of the “I don’t care about Bayes or Frequentism I just want methods that work” school of thought will not do the hard work of understanding and extending full Bayes in practice and will miss out.

November 16, 2014 at 2:08 am

I had said nearly 3 years ago, that my SLP result would be treated to the famous “3 stages of the acceptance of novel truths”. I think Christian’s post represents the second, “it really doesn’t matter”.

http://errorstatistics.com/2011/12/22/the-3-stages-of-the-acceptance-of-novel-truths/

But aside from what one thinks of the unimportance of this result –a result once deemed absolutely at the heart of statistical foundations (a “breakthrough” in Savage’s words)–it is sure to be a learning experience for textbook authors to show where they went wrong in their alleged “proofs” of the theorem. It would be a huge waste to simply have this seminal result quietly disappear from textbooks without identifying the flaw that makes it appear to follow. (That’s what I tried to do in the earlier, 2010 paper. The current paper goes much further into the workings of the Birnbaum apparatus, does not assume sampling theory, and develops a notation to get us beyond the linguistic equivocation and unnoticed logical flaw.) I can also extend my thanks to Christian who had raised the challenge in 2011 as to whether my argument could pass muster in a statistics journal. (I admit that I had to go much further , and did.) So my challenge, in return, is for a writer of one of the many textbooks with this result to identify the single line in the short “proof” that is flawed and why. Or, leave it as an exercise for the reader.

November 16, 2014 at 4:16 pm

I’m not sure this drift away from Birnbaum’s theorem as a foundational tool in Bayesian analysis is a result of the new work on it over the last few years.

I think that there’s been a general drift away from the philosophy of Bayes into the practical application of it, which, let’s face it, is significantly more advanced than when Savage was active.

I also hope that stats is becoming less segregated. I don’t think “I use Bayes because it’s the only sensible set of methods” is a particularly useful position to hold onto. (Nor is “OMG YOU CAN’T USE PRIOR INFORMATION”, but I think we’re all getting over that too.)

November 16, 2014 at 5:50 pm

I’m not sure what you mean by a drift away from its use as a foundational tool. You might read, if you haven’t, the commentators on my paper for a feel for people’s perspectives on it., and my final rejoinder. The interesting thing is that while there is a feeling of eclecticism and pragmatics, the criticisms of error statistical methods have never been harsher, and nearly always, someplace along the way, there’s a presumption that the proper “philosophical” measure of the import of evidence is by way of likelihoods. So, it’s there in the background as the elephant in the room. In my case the impetus was simply to clear the way for signing on to the conditioning in a paper I was writing with Cox. Finally, I might note, that delving into this issue is far more interesting and relevant than some esoteric philosophical matter. It’s a wonderful and easy way to get right to the heart of so many central issues in the development and debate over statistical methods in the past 60 years or so. Links to the central papers may be found on my blog.

November 15, 2014 at 12:22 am

Andrew seems to have truncated his response early. Maybe a mistake?

I agree, Christian, it is odd that neither Jim nor Robert were invited to be discussants on this paper.

But thanks for calling our attention to this.

Bill Jefferys

November 15, 2014 at 11:17 am

Thanks, Bill. I assume the +1 delivered by Andrew was the shortest he could manage in communicating his agreement!

November 15, 2014 at 3:31 pm

I guess I’m getting too old to pick up on such subtleties! Thanks for explaining Andrew’s comment. Bill

November 16, 2014 at 4:10 am

Perhaps Jim was asked and he turned it down, as I would suspect. I think the blend of excellent people who have worked on this along with newer approaches being developed that already reject the SLP was apt.

November 19, 2014 at 5:11 pm

I have an idea about why Jim Berger may have chosen

not to comment. Years ago, I had e-mailed Jim with a gripe

about a supposed “counter-example” to the LP being published

in a reputed journal. Jim wrote that he had pretty much stopped

refereeing anything about the likelihood principle.

He would go through an argument carefully and find a flaw

in reasoning, and then the author would make some change

and he’d have to do it all over again, so finally he just quit refereeing them.

Sudip

November 19, 2014 at 8:28 pm

Thanks, Sudip! In this Statistical Science discussion paper, I find the picture confusing as no one seems to take the same definition of the likelihood principle. A few years ago, I had tried to write down the difficulties I had with the criticisms. I now am at a stage where I do not think it [the LP] matters that much.

November 20, 2014 at 4:18 am

I asked Jim about this and what he wrote back to me is consistent with what Sudip says.

November 21, 2014 at 11:36 am

Thanks! I read Jim’s e-mail carefully before posting

to make sure I didn’t misrepresent what he wrote,

but I still hesitated. Anyway, I guess I’m glad

that the post was one of the proximate causes

of a Jefferys-Berger interaction!

Speaking of which, I attended a Physics seminar talk

by Penn State’s Eric Feigelson at GWU, earlier this month.

During the talk, he listed a number of topics and problems

in astronomy and statistical techniques that could be of use.

As a Bayesian, I was pleased to be able to mention

Bayesian model selection and the Cepheid distance scale

(Jefferys/Barnes/Berger/Mueller).

November 26, 2014 at 11:51 pm

My argument does not fall under “counterexamples to the LP”, but a counterexample to a ‘proof’ that claims to be valid is one that shows the premises to be true and the conclusion false. But I also go much further than this (see my rejoinder to Evans). Texts should want to straighten out their claims to theorem hood, I should think. Were the matter so trivial, I doubt those commentators would have agreed to write.

November 29, 2014 at 10:31 am

First, I did not intend to suggest that Prof. Mayo’s argument was meant as a “counterexample” to the LP.

Second, certainly an example in which the conditions of a “theorem” are satisfied while the theorem’s conclusions do not hold, can be described as a “counterexample” to the validity of the theorem.

Third, without getting into an analysis of what sort of arguments would be too “trivial” for particular commentators to write about, I can say that I am aware of Jim Berger (and other researchers for whom I have high regard) commenting on an article that appeared severely flawed to me. I would venture to speculate that they did so because the article in question was being published in a reputed journal.

November 13, 2014 at 5:31 pm

Jaynes on the likelihood principle:

http://www.mathmarauder.com/archives/441

November 13, 2014 at 2:13 pm

> models are no longer accepted as ideal truth and where approximation is the hard reality

About time – as statisticians we only get rather wide brushes to paint pictures that attempt to represent reality not too misleadingly.

In theory everything can be exact and even (in the abstract) continuous but in practice it is always messy approximate and discrete. Somehow the clarity/exactness of theory gets projected onto the applications in ways that can be very nonsensical.

November 16, 2014 at 5:58 pm

Kempthorne raised this issue about models never being true back in his response to the 1963 Birnbaum paper. I agree with Birnbaum’s response that this was irrelevant to the issue.

November 17, 2014 at 2:31 pm

Mayo: My comment was just about “approximation is the hard reality” (which I think you would agree with).

It was not meant to distract from “if we take the model as true, what follows?”

One of the difficulties I perceive here (the general issue of how to choose and justify statistical approaches) is that tree ent’s phrase “hard work of understanding and extending full Bayes in practice” is not clearly or widely explicated.

Additionally, in the phrase “But since it often does take work, only those who really understand Bayes and take it seriously will do it” – how are we to know who really understands or learn if that includes us?

Explications of how SLP mislead many should be helpful in this regard.

November 13, 2014 at 3:41 am

+1