## Bayesian brittleness, again

“With the advent of high-performance computing, Bayesian methods are increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is becoming a pressing question.”

**A** second paper by Owhadi, Scovel and Sullivan on Bayesian brittleness has just been arXived. This one has the dramatic title `W** hen Bayesian inference shatters**‘..! If you remember (or simply check) my earlier post, the topic of this work is the robustness of Bayesian inference under model mispsecification, robustness which is completely lacking from the authors’ perspective. This paper is much shorter than the earlier one (and sounds like a commentary on it), but it concludes in a similar manner, namely that Bayesian inference suffers from `maximal brittleness under local mis-specication’ (p.6), which means that `the range of posterior predictions among all admissible priors is as wide as the deterministic range of the quantity of interest’ when the true model is not within the range of the parametric models covered by the prior distribution. The novelty in the paper appears to be in the extension that, even when we consider only the k first moments of the unknown distribution, Bayesian inference is not robust (this is called the Brittleness Theorem, p.9). As stated earlier, while I appreciate this sort of theoretical derivation, I am somehow dubious as to whether or not this impacts the practice of Bayesian statistics to the amount mentioned in the above quote. In particular, I do not see how those results cast

*more*doubts on the impact of the prior modelling on the posterior outcome. While we all (?) agree on the fact that “any given prior and model can be slightly perturbed to achieve any desired posterior conclusion”, the repeatability or falsifiability of the Bayesian experiment (change your prior and run the experiment afresh) allows for an assessment of the posterior outcome that prevents under-the-carpet effects.

January 4, 2014 at 4:53 am

[…] preparing a reply to Owhadi, I discovered a comment written by Dave Higdon on Xian’s Og a few days after OSS’s “plain jane” […]

September 11, 2013 at 6:44 pm

I just wrote this as a comment on Mayo’s blog: “As far as I understand from first reading, the two brittleness theorems do not concern the whole of the posterior distribution, but its expected value, and they basically state the consequences of the well known instability of the expected value as a functional on the space of distributions for Bayesian inference. In frequentist statistics, such observations gave rise to robustness concepts such as “qualitative robustness” in the 70s/80s but the Bayesians didn’t bother much.

The Bayesians could defend themselves by saying that they are not so much interested in the expectation of the posterior but rather in functionals such as the posterior mode or median, or in probabilities of certain sets of interest (credibility intervals), which are not affected by these theorems (though they may be non-robust/brittle in some sense, too).”

Does this matter in practical Bayesian inference? It depends. I think that the most important message is that the expected value of the posterior is really not a very good statistic because it depends heavily on the assignment of low probability in the extreme tails of the posterior, which in practice can hardly be done reliably.

September 14, 2013 at 7:37 am

My own explanation of the discrepancy is that functional spaces are really really big and hence an arbitrary small distance in such spaces (a) does not mean anything intuitive and (b) strongly depends on the choice of the distance (since they are not equivalent any longer)…

September 15, 2013 at 5:40 am

Isn’t this just the oft-stated fact tha there are no non-informative infinite dimensional priors? As the post-9/11 ad campaign said in Austalia “be aware but not alarmed”.

September 15, 2013 at 8:42 am

yep, that’s it! Keep calm and carry on…

September 17, 2013 at 12:47 am

I don’t think there is anything deep or fundamental in this paper. OSS give conditions in their Theorem 2 that allow one to do a cute math trick, but I don’t think they say anything fundamental about Bayesian inference. In fact, their inconsistency example isn’t even related to the results of this paper.

Here’s a simple example that can be applied to their brittleness result:

a single observation: x ~ N(theta,1)

prior for theta: theta ~ N(0,1/100)

observe: x=0

Thus the posterior for theta is N(0,.99).

For a new iid observaton x* we care about

phi: posterior probability P(x* > 10)?

This is a > 5-sigma event, so the probaiblity is about 0.

For this simple example, OSS’s theorem 2 constructs a “nearby” likelihood for which the resulting posterior P(x* > 10) is 1. The original likelihood is

L(x|theta) = exp{-.5*(x-theta)^2}

conditions (3) and (4) allow us to construct the nearby likelihood function where e is small:

L*(x|theta) = L(x|theta)*I[x is not in (-e,e)] if theta less than 20;

L*(x|theta) = L(x|theta) if theta greater than 20..

This is what conditions (3) and (4) allow. L* is close to L in a TV sort of metric. But now the posterior only admits values of theta >= 20. Hence P(x* > 10) = 1 under this nearby prior (which is really a likelihood). Since this data generation model is in script Q, this shows that the worst case prior in script Q can give a very different result.

So the conditions of theorem 2 seem to say that if you are allowed to make the observed data impossible for most values of theta, while allowing some probability for the data for extreme values of theta, you can get extreme results. It seems more like a bar trick than a result.

September 17, 2013 at 7:54 am

Bar trick, I love it!

September 18, 2013 at 7:57 am

That’s a great (counter-)example, Dave, which shows how poorly adequate the TV metric is… Thanks!

September 11, 2013 at 6:11 am

I got somewhat confused to be honest (although less so than the last time). As I was reading it, I kept thinking of the arguments of Bissiri etc in http://arxiv.org/abs/1306.6430

The discussion on page 4 seemed to suggest (although I may be over-generalising) that you can specify k “quantities of interest” that are preserved and stll get “maximal brittleness”. In this case, why do we care about what the rest of the posterior is doing?

If we can ensure that everything is ok about a (finite set of) utility function(s), then isn’t that enough to perform robust bayesian inference? Or have I gone sailing past the point.

September 11, 2013 at 3:31 am

Christian: thanks for posting this article which I hadn’t know about. I will study it. I don’t understand what you mean by “under the carpet” effects. Thanks.