Archive for all models are wrong

over-confident about mis-specified models?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on April 30, 2019 by xi'an

Ziheng Yang and Tianqui Zhu published a paper in PNAS last year that criticises Bayesian posterior probabilities used in the comparison of models under misspecification as “overconfident”. The paper is written from a phylogeneticist point of view, rather than from a statistician’s perspective, as shown by the Editor in charge of the paper [although I thought that, after Steve Fienberg‘s intervention!, a statistician had to be involved in a submission relying on statistics!] a paper , but the analysis is rather problematic, at least seen through my own lenses… With no statistical novelty, apart from looking at the distribution of posterior probabilities in toy examples. The starting argument is that Bayesian model comparison is often reporting posterior probabilities in favour of a particular model that are close or even equal to 1.

“The Bayesian method is widely used to estimate species phylogenies using molecular sequence data. While it has long been noted to produce spuriously high posterior probabilities for trees or clades, the precise reasons for this over confidence are unknown. Here we characterize the behavior of Bayesian model selection when the compared models are misspecified and demonstrate that when the models are nearly equally wrong, the method exhibits unpleasant polarized behaviors,supporting one model with high confidence while rejecting others. This provides an explanation for the empirical observation of spuriously high posterior probabilities in molecular phylogenetics.”

The paper focus on the behaviour of posterior probabilities to strongly support a model against others when the sample size is large enough, “even when” all models are wrong, the argument being apparently that the correct output should be one of equal probability between models, or maybe a uniform distribution of these model probabilities over the probability simplex. Why should it be so?! The construction of the posterior probabilities is based on a meta-model that assumes the generating model to be part of a list of mutually exclusive models. It does not account for cases where “all models are wrong” or cases where “all models are right”. The reported probability is furthermore epistemic, in that it is relative to the measure defined by the prior modelling, not to a promise of a frequentist stabilisation in a ill-defined asymptotia. By which I mean that a 99.3% probability of model M¹ being “true”does not have a universal and objective meaning. (Moderation note: the high polarisation of posterior probabilities was instrumental in our investigation of model choice with ABC tools and in proposing instead error rates in ABC random forests.)

The notion that two models are equally wrong because they are both exactly at the same Kullback-Leibler distance from the generating process (when optimised over the parameter) is such a formal [or cartoonesque] notion that it does not make much sense. There is always one model that is slightly closer and eventually takes over. It is also bizarre that the argument does not account for the complexity of each model and the resulting (Occam’s razor) penalty. Even two models with a single parameter are not necessarily of intrinsic dimension one, as shown by DIC. And thus it is not a surprise if the posterior probability mostly favours one versus the other. In any case, an healthily sceptic approach to Bayesian model choice means looking at the behaviour of the procedure (Bayes factor, posterior probability, posterior predictive, mixture weight, &tc.) under various assumptions (model M¹, M², &tc.) to calibrate the numerical value, rather than taking it at face value. By which I do not mean a frequentist evaluation of this procedure. Actually, it is rather surprising that the authors of the PNAS paper do not jump on the case when the posterior probability of model M¹ say is uniformly distributed, since this would be a perfect setting when the posterior probability is a p-value. (This is also what happens to the bootstrapped version, see the last paragraph of the paper on p.1859, the year Darwin published his Origin of Species.)

model misspecification in ABC

Posted in Statistics with tags , , , , , , , , on August 21, 2017 by xi'an

With David Frazier and Judith Rousseau, we just arXived a paper studying the impact of a misspecified model on the outcome of an ABC run. This is a question that naturally arises when using ABC, but that has been not directly covered in the literature apart from a recently arXived paper by James Ridgway [that was earlier this month commented on the ‘Og]. On the one hand, ABC can be seen as a robust method in that it focus on the aspects of the assumed model that are translated by the [insufficient] summary statistics and their expectation. And nothing else. It is thus tolerant of departures from the hypothetical model that [almost] preserve those moments. On the other hand, ABC involves a degree of non-parametric estimation of the intractable likelihood, which may sound even more robust, except that the likelihood is estimated from pseudo-data simulated from the “wrong” model in case of misspecification.

In the paper, we examine how the pseudo-true value of the parameter [that is, the value of the parameter of the misspecified model that comes closest to the generating model in terms of Kullback-Leibler divergence] is asymptotically reached by some ABC algorithms like the ABC accept/reject approach and not by others like the popular linear regression [post-simulation] adjustment. Which suprisingly concentrates posterior mass on a completely different pseudo-true value. Exploiting our recent assessment of ABC convergence for well-specified models, we show the above convergence result for a tolerance sequence that decreases to the minimum possible distance [between the true expectation and the misspecified expectation] at a slow enough rate. Or that the sequence of acceptance probabilities goes to zero at the proper speed. In the case of the regression correction, the pseudo-true value is shifted by a quantity that does not converge to zero, because of the misspecification in the expectation of the summary statistics. This is not immensely surprising but we hence get a very different picture when compared with the well-specified case, when regression corrections bring improvement to the asymptotic behaviour of the ABC estimators. This discrepancy between two versions of ABC can be exploited to seek misspecification diagnoses, e.g. through the acceptance rate versus the tolerance level, or via a comparison of the ABC approximations to the posterior expectations of quantities of interest which should diverge at rate Vn. In both cases, ABC reference tables/learning bases can be exploited to draw and calibrate a comparison with the well-specified case.

La déraisonnable efficacité des mathématiques

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on May 11, 2017 by xi'an

Although it went completely out of my mind, thanks to a rather heavy travel schedule, I gave last week a short interview about the notion of mathematical models, which got broadcast this week on France Culture, one of the French public radio channels. Within the daily La Méthode Scientifique show, which is a one-hour emission on scientific issues, always a [rare] pleasure to listen to. (Including the day they invited Claire Voisin.) The theme of the show that day was about the unreasonable effectiveness of mathematics, with the [classical] questioning of whether it is an efficient tool towards solving scientific (and inference?) problems because the mathematical objects pre-existed their use or we are (pre-)conditioned to use mathematics to solve problems. I somewhat sounded like a dog in a game of skittles, but it was interesting to listen to the philosopher discussing my relativistic perspective [provided you understand French!]. And I appreciated very much the way Céline Loozen the journalist who interviewed me sorted the chaff from the wheat in the original interview to make me sound mostly coherent! (A coincidence: Jean-Michel Marin got interviewed this morning on France Inter, the major public radio, about the Grothendieck papers.)

Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on May 9, 2017 by xi'an

Guillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as nto a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that all models are wrong: (resonating from last week discussion):

  1. While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any);
  2. There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
  3. In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
  4. And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
  5. The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
  6. In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.

principles or unprincipled?!

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on May 2, 2017 by xi'an

A lively and wide-ranging discussion during the Bayes, Fiducial, Frequentist conference was about whether or not we should look for principles. Someone mentioned Terry Speed (2016) claim that it does not help statistics in being principled. Against being efficient. Which gets quite close in my opinion to arguing in favour of a no-U-turn move to machine learning—which requires a significant amount of data to reach this efficiency, as Xiao-Li Meng mentioned—. The debate brought me back to my current running or droning argument on the need to accommodate [more] the difference between models and reality. Not throwing away statistics and models altogether, but developing assessments that are not fully chained to those models. While keeping probabilistic models to handle uncertainty. One pessimistic conclusion I drew from the discussion is that while we [as academic statisticians] may set principles and even teach our students how to run principled and ethical statistical analyses, there is not much we can do about the daily practice of users of statistics…

RSS Read Paper

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on April 17, 2017 by xi'an

I had not attended a Read Paper session at the Royal Statistical Society in Errol Street for quite a while and hence it was quite a treat to be back there, especially as a seconder of the vote of thanks for the paper of Andrew Gelman and Christian Hennig. (I realised at this occasion that I had always been invited as a seconder, who in the tradition of the Read Papers is expected to be more critical of the paper. When I mentioned that to a friend, he replied they knew me well!) Listening to Andrew (with no slide) and Christian made me think further about the foundations of statistics and the reasons why we proceed as we do. In particular about the meaning and usages of a statistical model. Which is only useful (in the all models are wrong meme) if the purpose of the statistical analysis is completely defined. Searching for the truth does not sound good enough. And this brings us back full circle to decision theory in my opinion, which should be part of the whole picture and the virtues of openness, transparency and communication.

During his talk, Christian mentioned outliers as a delicate issue in modelling and I found this was a great example of a notion with no objective meaning, in that it is only defined in terms of or against a model, in that it addresses the case of observations not fitting a model instead of a model not fitting some observations, hence as much a case of incomplete (lazy?) modelling as an issue of difficult inference. And a discussant (whose Flemish name I alas do not remember) came with the slide below of an etymological reminder that originally (as in Aristotle) the meaning of objectivity and subjectivity were inverted, in that the later meant about the intrinsic nature of the object, while the former was about the perception of this object. It is only in the modern (?) era that Immanuel Kant reverted the meanings…Last thing, I plan to arXiv my discussions, so feel free to send me yours to add to the arXiv document. And make sure to spread the word about this discussion paper to all O-Bayesians as they should feel concerned about this debate!

Bayes, reproducibility, and the quest for truth

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on September 2, 2016 by xi'an

“Avoid opinion priors, you could be held legally or otherwise responsible.”

Don Fraser, Mylène Bedard, Augustine Wong, Wei Lin, and Ailana Fraser wrote a paper to appear in Statistical Science, with the above title. This paper is a continuation of Don’s assessment of Bayes procedures in earlier Statistical Science [which I discussed] and Science 2013 papers, which I would qualify with all due respect of a demolition enterprise [of the Bayesian approach to statistics]…  The argument therein is similar in that “reproducibility” is to be understood therein as providing frequentist confidence assessment. The authors also use “accuracy” in this sense. (As far as I know, there is no definition of reproducibility to be found in the paper.) Some priors are matching priors, in the (restricted) sense that they give second-order accurate frequentist coverage. Most are not matching and none is third-order accurate, a level that may be attained by alternative approaches. As far as the abstract goes, this seems to be the crux of the paper. Which is fine, but does not qualify in my opinion as a criticism of the Bayesian paradigm, given that (a) it makes no claim at frequentist coverage and (b) I see no reason in proper coverage being connected with “truth” or “accuracy”. It truly makes no sense to me to attempt either to put a frequentist hat on posterior distributions or to check whether or not the posterior is “valid”, “true” or “actual”. I similarly consider that Efron‘s “genuine priors” do not belong to the Bayesian paradigm but are on the opposite anti-Bayesian in that they suggest all priors should stem from frequency modelling, to borrow the terms from the current paper. (This is also the position of the authors, who consider they have “no Bayes content”.)

Among their arguments, the authors refer to two tragic real cases: the earthquake at L’Aquila, where seismologists were charged (and then discharged) with manslaughter for asserting there was little risk of a major earthquake, and the indictment of the pharmaceutical company Merck for the deadly side-effects of their drug Vioxx. The paper however never return to those cases and fails to explain in which sense this is connected with the lack of reproducibility or of truth(fullness) of Bayesian procedures. If anything, the morale of the Aquila story is that statisticians should not draw definitive conclusions like there is no risk of a major earthquake or that it was improbable. There is a strange if human tendency for experts to reach definitive conclusions and to omit the many layers of uncertainty in their models and analyses. In the earthquake case, seismologists do not know how to predict major quakes from the previous activity and that should have been the [non-]conclusion of the experts. Which could possibly have been reached by a Bayesian modelling that always includes uncertainty. But the current paper is not at all operating at this (epistemic?) level, as it never ever questions the impact of the choice of a likelihood function or of a statistical model in the reproducibility framework. First, third or 47th order accuracy nonetheless operates strictly within the referential of the chosen model and providing the data to another group of scientists, experts or statisticians will invariably produce a different statistical modelling. So much for reproducibility or truth.