## Bayesian brittleness

Posted in Statistics with tags , , , , , on May 3, 2013 by xi'an

Here is the abstract of a recently arXived paper that attracted my attention:

Although it is known that Bayesian estimators may be inconsistent if the model is misspecified, it is also a popular belief that a “good” or “close” enough model should have good convergence properties. This paper shows that, contrary to popular belief, there is no such thing as a “close enough” model in Bayesian inference in the following sense: we derive optimal lower and upper bounds on posterior values obtained from models that exactly capture an arbitrarily large number of finite-dimensional marginals of the data-generating distribution and/or that are arbitrarily close to the data-generating distribution in the Prokhorov or total variation metrics; these bounds show that such models may still make the largest possible prediction error after conditioning on an arbitrarily large number of sample data. Therefore, under model misspecification, and without stronger assumptions than (arbitrary) closeness in Prokhorov or total variation metrics, Bayesian inference offers no better guarantee of accuracy than arbitrarily picking a value between the essential infimum and supremum of the quantity of interest. In particular, an unscrupulous practitioner could slightly perturb a given prior and model to achieve any desired posterior conclusions.ink

The paper is both too long and too theoretical for me to get into it deep enough. The main point however is that, given the space of all possible measures, the set of (parametric) Bayes inferences constitutes a tiny finite-dimensional that may lie far far away from the true model. I do not find the result unreasonable, far from it!, but the fact that Bayesian (and other) inferences may be inconsistent for most misspecified models is not such a major issue in my opinion. (Witness my post on the Robins-Wasserman paradox.) I am not so much convinced either about this “popular belief that a “good” or “close” enough model should have good convergence properties”, as it is intuitively reasonable that the immensity of the space of all models can induce non-convergent behaviours. The statistical question is rather what can be done about it. Does it matter that the model is misspecified? If it does, is there any meaning in estimating parameters without a model? For a finite sample size, should we at all bother that the model is not “right” or “close enough” if discrepancies cannot be detected at this precision level? I think the answer to all those questions is negative and that we should proceed with our imperfect models and imperfect inference as long as our imperfect simulation tools do not exhibit strong divergences.

## AMIS convergence, at last!

Posted in Statistics, University life with tags , , , , , , on November 19, 2012 by xi'an

This afternoon, Jean-Michel Marin gave his talk at the big’MC seminar. As already posted, it was about a convergence proof for AMIS, which gave me the opportunity to simultaneously read the paper and listen to the author. The core idea for adapting AMIS towards a manageable version is to update the proposal parameter based on the current sample rather than on the whole past. This facilitates the task of establishing convergence to the optimal (pseudo-true) value of the parameter, under an assumption that the optimal value is a know moment of the target. From there, convergence of the weighted mean is somehow natural when the number of simulations grows to infinity. (Note the special asymptotics of AMIS, though, which are that the number of steps goes to infinity while the number of simulations per step grows a wee faster than linearly. In this respect, it is the opposite of PMC, where convergence is of a more traditional nature, pushing the number of simulations per step to infinity.) The second part of the convergence proof is more intricate, as it establishes that the multiple mixture estimator based on the “forward-backward” reweighting of all simulations since step zero does converge to the proper posterior moment. This relies on rather complex assumptions, but remains a magnificent tour de force. During the talk, I wondered if, given the Markovian nature of the algorithm (since reweighting only occurs once simulation is over), an alternative estimator based on the optimal value of the simulation parameter would not be better than the original multiple mixture estimator: the proof is based on the equivalence between both versions….

## ABC as knn…

Posted in Statistics with tags , , , , on September 13, 2012 by xi'an

Gérard Biau, Frédéric Cérou, and Arnaud Guyader recently posted an arXiv paper on the foundations of ABC, entitled “New insights into Approximate Bayesian Computation“. They also submitted it to several statistics journals, with no success so far, and I find this rather surprising. Indeed, the paper analyses the ABC algorithm the way it is truly implemented (as in DIYABC for instance), i.e. with a tolerance bound ε that is determined as a quantile of the simulated distances, say the 10% or the 1% quantile. This means in particular that the interpretation of ε as a non-parametric bandwidth, while interesting and prevalent in the literature (see, e.g., Fearnhead and Prangle’s discussion paper), is only an approximation of the actual practice.

The authors of this new paper focus on the mathematical foundations of this practice, by (re)analysing ABC as a k-nearest neighbour (knn) method. Using generic knn results, they thus derive a consistency property for the ABC algorithm by imposing some constraints upon the rate of decrease of the quantile as a function of n. (The setting is restricted to the use of sufficient statistics or, equivalently, to a distance over the whole sample. The issue of summary statistics is not addressed by the paper.) The paper also contains a perfectly rigorous proof (the first one?) of the convergence of ABC when the tolerance ε goes to zero. The mean integrated square error consistency of the conditional kernel density estimate is established for a generic kernel (under usual assumptions). Further assumptions (on the target and on the kernel) allow the authors to obtain precise convergence rates (as a power of the sample size), derived from classical k-nearest neighbour regression, like

$k_N \approx N^{(p+4)/(m+p+4)}$

in dimensions m larger than 4…. The paper is completely theoretical and highly mathematical (with 25 pages of proofs!), which may explain why it did not meet with success with editors and/or referees, however I definitely think (an abridged version of) this work clearly deserves publication in a top statistics journal as a reference for the justification of ABC! The authors also mention future work in that direction: I would strongly suggest they consider the case of the insufficient summary statistics from this knn perspective.

## potentially relevant

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on March 14, 2012 by xi'an

This week, freshly back from Roma, I got the reviews on our paper “Relevant statistics for Bayesian model choice” from Series B. The comments are detailed and mostly to the point, expressing concern about the relevance of the paper for statistical methodology as the major issue.  We are thus asked for a revision making a much better connection with ABC methodology.

This is not an unexpected outcome, from my point of view, because the paper is indeed quite theoretical and the mathematical assumptions required to obtain the convergence theorems are rather overwhelming… Meaning that in practical cases they cannot truly be checked. However, I think we can eventually address those concerns for two distinct reasons: first, the paper comes as a third step in a series of papers where we first identified a sufficiency property, then realised that this property was actually quite a rare occurrence, and finally made a theoretical advance as to when is a summary statistic enough (i.e. “sufficient” in the standard sense of the term!)  to conduct model choice, with a clear answer that the mean ranges of the summary statistic under each model could not intersect.  Second, my own personal view is that those assumptions needed for convergence are not of the highest importance for statistical practice (even though they are needed in the paper!) and thus that, from a methodological point of view, only the conclusion should be taken into account. It is then rather straightforward to come up with (quick-and-dirty) simulation devices to check whether a summary statistic behaves differently under both models, taking advantage of the reference table already available (instead of having to run Monte Carlo experiments with ABC basis)…

One of the comments was that maybe Bayes factors were not appropriate for conducting model choice, thus making the whole derivation irrelevant. This is a possible perspective but it can be objected that Bayes factors and posterior probabilities are used in conjunction with ABC in dozens of genetic papers. Further arguments are provided in the various replies to both of Templeton’s radical criticisms. That more empirical and model-based assessments also are available is quite correct, as demonstrated in the multicriterion approach of Olli Ratmann and co-authors. This is simply another approach, not followed by most geneticists so far…

## updated slides for ABC PhD course

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , on February 8, 2012 by xi'an

Over the weekend, I have added a few slides referring to recent papers mentioning the convergence of ABC algorithms, in particular the very relevant paper by Dean et al. I had already discussed in an earlier post. (This coursing is taking a much larger chunk of my time than I expected! I am thus relieved I will use the same slides in Roma next month and presumably in Melbourne next summer…)