Is Bayes posterior [quick and] dirty?!

I have been asked to discuss the on-coming Statistical Science paper by Don Fraser, Is Bayes posterior quick and dirty confidence?.  The title was intriguing if clearly provocative and so did I read through the whole paper… (The following is a draft of my discussion.)

The central point in Don’s paper seems to be a demonstration that Bayes confidence sets are not valid because they do not provide the proper frequentist coverage. While I appreciate the effort made therein of evaluating Bayesian bounds in a frequentist light, and while Don’s paper does shed new insight on the evaluation of Bayesian bounds in a frequentist light, the main point of the paper seems to be a radical reexamination of the relevance of the whole Bayesian approach to confidence regions. The outcome is rather surprising in that the disagreement between classical and frequentist perspectives is usually quite limited [in contrast with tests] in that the coverage statements agree to orders between n^{-1/2} and n^{-1}, following older results by Welch and Peers (1963).

First, the paper seems to contain apocryphal deeds attributed to Bayes. My understanding of the 1763 posthumous paper of Thomas Bayes is one of a derivation of the posterior distribution of a probability parameter θ driving a binomial observation. It thus fails to contain anything about location parameters, in relation with the “translation invariance” mentioned in the Introduction or in Section 7. As noted in Fienberg (2006), Thomas Bayes does not either introduce explicitly the constant prior as a rule, even in his limited perspective, this had to wait for Pierre Simon de Laplace twenty to thirty years later. Thanks to Don’s paper, I however re-read Bayes’  (in Edward Deming‘s 1940 reprint) and found in RULE 2 (page 400 and further) and RULE 3 (page 403 and further) that Bayes approximated the (posterior) probability that the parameter θ is between x/n-z and x/n+z. However, a closer examination revealed that this part (starting on page 399) had actually been written by Richard Price (even though Price mentions “Mr Bayes’s manuscript”… (More about this in a in-coming post!)

     

My second and more important point of contention is that the Bayesian perspective on confidence (or credible) regions and statements does not claim “coverage” from a frequentist viewpoint since it is articulated in terms of the parameter(s). Probability calculus remains probability calculus whether it applies to the parameter space or to the observation space, to a proper or to an improper prior—making the comment about the term probability [being] less appropriate in the Bayesian weighted likelihood quite debatable. Following Jaynes (2003) [at least verbatim!], I do consider that, mathematically, “there is only one kind of probability”. The title of the paper is thus in direct contradiction with the purpose of Bayesian inference and it seems to me that the chance identity occurring for location parameters is a mere coincidence on which one should not build sandcastles.

  

I find Bayesian analysis neither quick (although it is logical, hence quick to reach a conclusion!) nor, obviously, dirty (on the opposite, it proposes a more complete and more elegant inferential framework!). Looking at a probability evaluation on the parameter space being “correct” (Section 3) also sounds strange  to me in that the referential for a Bayesian analysis is the prior endowed space, rather than the consequences on observable values that have not been observed, to paraphrase Harold Jeffreys. A Bayesian credible interval is therefore correct in terms of the posterior distribution it is derived from and does not address the completely different target of finding a frequency-valid interval. (The distinction made in the Bayesian literature, as e.g. Berger (1985) between confidence and credible intervals is significant for those different purposes.) That a β quantile Bayesian confidence bound does not exclude the true value of the parameter in (100 β)% of the observations is not a cause for worry when considering only the observed y0, and the example of Section 4 is illustrating this perspective to perfection. When I see on Figure 4 (c) that the Bayesian coverage starts at 1 when \theta=\theta_0 I am indeed quite happy with the fact that this coherent procedure accounts for the fact that \theta cannot be lesser than \theta_0. I thus strongly object to the dire conclusion of Bayes’ approach [being] viewed as a long history of misdirection! I also fail to understand what is the meaning of “reality” in Section 7. When running Bayesian inference, the parameter \theta driving the observed data is fixed but unknown. Having a prior attached to it has nothing to do with “reality”, it is a reference measure that is necessary for making probability statements (or, quoting again from Jaynes, 2003, extending the logics framework). Thus the apparently logical concern in Section 7 on how probabilities can reasonably be attached to a constant has no raison d’être. Neither has the debate about where the prior comes from (Section 9). If the matter is about improper versus proper priors, this has been extensively discussed in the literature and the difference seems to me less important than the difference between Bayes and generalised Bayes estimators.

        

I also object to the debate about optimality (and the subsequent relevance of Bayes procedures) as I do believe that decision theory brings a useful if formal representation of statistical inference. The choice of the criterion (which I understand as the choice of the loss function) is clearly important; however, it helps in putting a meaning to notions like “real” or “true” or “correct” found in the paper. Changing the criterion does change the outcome for the “optimal” interval but the underlying relevance of Bayesian procedures does not go away. For instance, we proposed in Robert and Casella (1994) several of such losses for evaluating confidence sets. The criticism found at the end of this Section 9 is inappropriate in that the posterior quantile is neither derived from a loss function nor evaluated under a specific loss function, since the “non-zero” curvature drawback stems from a frequentist perspective. Let me add that, even from a frequentist perspective, strange and counter-intuitive phenomena can occur, like the domination of the classical confidence region by regions that are equal to the empty set with positive probability (Hwang and Chen, 1986).

In conclusion, I am quite sorry about the negative (and possibly strident) tone of this discussion. However, I do not see a convincing reason for opening afresh the Pandora box about the (lack of) justifications for the Bayesian approach, the true nature of probability and the philosophical relevance of priors: The last section is a nice and provocative enough collection of aphorisms, although I doubt it will make a dent in the convictions of Bayesian readers. However, Bayesian credible intervals are not frequentist confidence intervals and thus do not derive their optimality from providing an exact frequentist coverage.

point of the paper

6 Responses to “Is Bayes posterior [quick and] dirty?!”

  1. […] a coincidence, I noticed that Don Fraser’s recent discussion paper `Is Bayes posterior just quick and dirty confidence?’ will be discussed this Friday (18:00 UTC) on the Cross Validated Journal Club. I do not know […]

  2. […] of over-theorising/philosophising! (Referring the interested reader to the above post as well as to my comments on Don Fraser’s “Is Bayes posterior quick and dirty confidence?” for more related […]

  3. […] frequentist properties of Bayesian credible intervals reminded me of the recent discussion paper by Don Fraser on the topic, which follows the same argument that Bayesian credible regions are not necessarily […]

  4. […] Tweedie visited me in Paris and when I visited him in Fort Collins… Coincidentally, my discussion of Don Fraser’s provocative Is Bayes Posterior just Quick and Dirty Confidence? also […]

  5. […] and Minge Xie, Larry Wasserman (who coined the neologism Frasian for the occasion), Tong Zhang, and myself, Don Fraser has written his rejoinder to the discussion (although in Biometrika style it is for […]

  6. I am a devotee of the Bayesian approach, as you know. And yet, I have some misgivings about the lack of frequentist coverage. I imagine two artificial intelligence agents, one of which has a set of posterior predictive probability distributions and the other of which has a set of exact predictive confidence distributions. When the observables of interest become known, we propose to calculate the probability integral transforms relative to their respective distributions and make a histogram of the resulting random deviates. Each agent asserts that the histogram will be uniform, but this is guaranteed only for the exact confidence distributions.

    Exact confidence distributions are not nearly as flexible and useful as Bayesian posterior distributions, being available only in a limited set of circumstances, so this isn’t even close to a knock-down argument against the Bayesian approach. But when I read arguments like Fraser’s, the ideas I related above do niggle at me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.