Bayesian brittleness

Here is the abstract of a recently arXived paper that attracted my attention:

Although it is known that Bayesian estimators may be inconsistent if the model is misspecified, it is also a popular belief that a “good” or “close” enough model should have good convergence properties. This paper shows that, contrary to popular belief, there is no such thing as a “close enough” model in Bayesian inference in the following sense: we derive optimal lower and upper bounds on posterior values obtained from models that exactly capture an arbitrarily large number of finite-dimensional marginals of the data-generating distribution and/or that are arbitrarily close to the data-generating distribution in the Prokhorov or total variation metrics; these bounds show that such models may still make the largest possible prediction error after conditioning on an arbitrarily large number of sample data. Therefore, under model misspecification, and without stronger assumptions than (arbitrary) closeness in Prokhorov or total variation metrics, Bayesian inference offers no better guarantee of accuracy than arbitrarily picking a value between the essential infimum and supremum of the quantity of interest. In particular, an unscrupulous practitioner could slightly perturb a given prior and model to achieve any desired posterior conclusions.ink

The paper is both too long and too theoretical for me to get into it deep enough. The main point however is that, given the space of all possible measures, the set of (parametric) Bayes inferences constitutes a tiny finite-dimensional set that may lie far far away from the true model. I do not find the result unreasonable, far from it!, but the fact that Bayesian (and other) inferences may be inconsistent for most misspecified models is not such a major issue in my opinion. (Witness my post on the Robins-Wasserman paradox.) I am not so much convinced either about this “popular belief that a “good” or “close” enough model should have good convergence properties”, as it is intuitively reasonable that the immensity of the space of all models can induce non-convergent behaviours. The statistical question is rather what can be done about it. Does it matter that the model is misspecified? If it does, is there any meaning in estimating parameters without a model? For a finite sample size, should we at all bother that the model is not “right” or “close enough” if discrepancies cannot be detected at this precision level? I think the answer to all those questions is negative and that we should proceed with our imperfect models and imperfect inference as long as our imperfect simulation tools do not exhibit strong divergences.

5 Responses to “Bayesian brittleness”

  1. If misspecification of the sampling model is permitted, does not the result have implications for MLE as well? Something like, upon deliberately contrived misspecification, the MLE can be steered to be arbitrarily far away?

    • If the model is misspecified, the MLE of the parameter of the misspecified model asymptotically corresponds to the value of the parameter that brings the misspecified model the closest to the true model in the Kullback-Leibler divergence sense. In this respect, the divergence can certainly be made arbitrarily large.

  2. Thanks for your comments about the paper. Perhaps there is a confusion on what the title means by “there is no “good enough” Bayesian model”.
    What is meant here is that “even if your model is close in TV (Total Variation) metric to the data generating distribution the output posterior value can be as far as possible from the quantity to be estimated” which is a different statement than “it is difficult for the model to be close (in TV metric, etc…)”.

    Concerning the three questions:

    (a) Does it matter that the model is misspecified?
    The answer is yes because if your model is misspecified and not close in Kullback Leibler divergence (which is a much stronger norm than TV) then the output posterior value can be anything
    between the essential min and max of the quantity of interest, i.e. the estimation has no guarantee of accuracy whatsoever.

    (b) If it does, is there any meaning in estimating parameters without a model?
    Yes, this will be the subject of our sequel work, the idea is to “compute” an optimal formula of the data given the available information. The paper is a bit long and technical because we are also laying down the foundations for such computations.

    (c) For a finite sample size, should we at all bother that the model is not “right” or “close enough” if discrepancies cannot be detected at this precision level?
    Even if the distance between the model and the data generating distribution is arbitrarily small there is no improvement on the robustness of the estimation.

    Houman

  3. A recent work related to specification problems in Bayesian estimation is “Risk of Bayesian Inference in Misspecified Models, and the Sandwich Covariance Matrix”, Ulrich K. Müller, Princeton University, forthcoming in Econometrica.
    http://www.princeton.edu/ ~ umueller / sandwich.pdf
    I think it is a great contribution, obtaining a correction analogous to the quasi-maximum likelihood in Bayesian procedures.

  4. Dan Simpson Says:

    Are there situations where divergent priors give equivalent (or “nearby’) inference for a class of utility functions? Because, as far as I can tell, that’s a more “useful” question in practice. (But almost certainly much harder to quantify).

    I’ve got to say, I got lost on this paper almost immediately! And, at <80 pages, I wasn't that motivated to find my way again…

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.