That’s a great (counter-)example, Dave, which shows how poorly adequate the TV metric is… Thanks!

]]>Bar trick, I love it!

]]>I don’t think there is anything deep or fundamental in this paper. OSS give conditions in their Theorem 2 that allow one to do a cute math trick, but I don’t think they say anything fundamental about Bayesian inference. In fact, their inconsistency example isn’t even related to the results of this paper.

Here’s a simple example that can be applied to their brittleness result:

a single observation: x ~ N(theta,1)

prior for theta: theta ~ N(0,1/100)

observe: x=0

Thus the posterior for theta is N(0,.99).

For a new iid observaton x* we care about

phi: posterior probability P(x* > 10)?

This is a > 5-sigma event, so the probaiblity is about 0.

For this simple example, OSS’s theorem 2 constructs a “nearby” likelihood for which the resulting posterior P(x* > 10) is 1. The original likelihood is

L(x|theta) = exp{-.5*(x-theta)^2}

conditions (3) and (4) allow us to construct the nearby likelihood function where e is small:

L*(x|theta) = L(x|theta)*I[x is not in (-e,e)] if theta less than 20;

L*(x|theta) = L(x|theta) if theta greater than 20..

This is what conditions (3) and (4) allow. L* is close to L in a TV sort of metric. But now the posterior only admits values of theta >= 20. Hence P(x* > 10) = 1 under this nearby prior (which is really a likelihood). Since this data generation model is in script Q, this shows that the worst case prior in script Q can give a very different result.

So the conditions of theorem 2 seem to say that if you are allowed to make the observed data impossible for most values of theta, while allowing some probability for the data for extreme values of theta, you can get extreme results. It seems more like a bar trick than a result.

]]>yep, that’s it! Keep calm and carry on…

]]>Isn’t this just the oft-stated fact tha there are no non-informative infinite dimensional priors? As the post-9/11 ad campaign said in Austalia “be aware but not alarmed”.

]]>My own explanation of the discrepancy is that functional spaces are really really big and hence an arbitrary small distance in such spaces (a) does not mean anything intuitive and (b) strongly depends on the choice of the distance (since they are not equivalent any longer)…

]]>The Bayesians could defend themselves by saying that they are not so much interested in the expectation of the posterior but rather in functionals such as the posterior mode or median, or in probabilities of certain sets of interest (credibility intervals), which are not affected by these theorems (though they may be non-robust/brittle in some sense, too).”

Does this matter in practical Bayesian inference? It depends. I think that the most important message is that the expected value of the posterior is really not a very good statistic because it depends heavily on the assignment of low probability in the extreme tails of the posterior, which in practice can hardly be done reliably.

]]>The discussion on page 4 seemed to suggest (although I may be over-generalising) that you can specify k “quantities of interest” that are preserved and stll get “maximal brittleness”. In this case, why do we care about what the rest of the posterior is doing?

If we can ensure that everything is ok about a (finite set of) utility function(s), then isn’t that enough to perform robust bayesian inference? Or have I gone sailing past the point.

]]>