Archive for model posterior probabilities

over-confident about mis-specified models?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on April 30, 2019 by xi'an

Ziheng Yang and Tianqui Zhu published a paper in PNAS last year that criticises Bayesian posterior probabilities used in the comparison of models under misspecification as “overconfident”. The paper is written from a phylogeneticist point of view, rather than from a statistician’s perspective, as shown by the Editor in charge of the paper [although I thought that, after Steve Fienberg‘s intervention!, a statistician had to be involved in a submission relying on statistics!] a paper , but the analysis is rather problematic, at least seen through my own lenses… With no statistical novelty, apart from looking at the distribution of posterior probabilities in toy examples. The starting argument is that Bayesian model comparison is often reporting posterior probabilities in favour of a particular model that are close or even equal to 1.

“The Bayesian method is widely used to estimate species phylogenies using molecular sequence data. While it has long been noted to produce spuriously high posterior probabilities for trees or clades, the precise reasons for this over confidence are unknown. Here we characterize the behavior of Bayesian model selection when the compared models are misspecified and demonstrate that when the models are nearly equally wrong, the method exhibits unpleasant polarized behaviors,supporting one model with high confidence while rejecting others. This provides an explanation for the empirical observation of spuriously high posterior probabilities in molecular phylogenetics.”

The paper focus on the behaviour of posterior probabilities to strongly support a model against others when the sample size is large enough, “even when” all models are wrong, the argument being apparently that the correct output should be one of equal probability between models, or maybe a uniform distribution of these model probabilities over the probability simplex. Why should it be so?! The construction of the posterior probabilities is based on a meta-model that assumes the generating model to be part of a list of mutually exclusive models. It does not account for cases where “all models are wrong” or cases where “all models are right”. The reported probability is furthermore epistemic, in that it is relative to the measure defined by the prior modelling, not to a promise of a frequentist stabilisation in a ill-defined asymptotia. By which I mean that a 99.3% probability of model M¹ being “true”does not have a universal and objective meaning. (Moderation note: the high polarisation of posterior probabilities was instrumental in our investigation of model choice with ABC tools and in proposing instead error rates in ABC random forests.)

The notion that two models are equally wrong because they are both exactly at the same Kullback-Leibler distance from the generating process (when optimised over the parameter) is such a formal [or cartoonesque] notion that it does not make much sense. There is always one model that is slightly closer and eventually takes over. It is also bizarre that the argument does not account for the complexity of each model and the resulting (Occam’s razor) penalty. Even two models with a single parameter are not necessarily of intrinsic dimension one, as shown by DIC. And thus it is not a surprise if the posterior probability mostly favours one versus the other. In any case, an healthily sceptic approach to Bayesian model choice means looking at the behaviour of the procedure (Bayes factor, posterior probability, posterior predictive, mixture weight, &tc.) under various assumptions (model M¹, M², &tc.) to calibrate the numerical value, rather than taking it at face value. By which I do not mean a frequentist evaluation of this procedure. Actually, it is rather surprising that the authors of the PNAS paper do not jump on the case when the posterior probability of model M¹ say is uniformly distributed, since this would be a perfect setting when the posterior probability is a p-value. (This is also what happens to the bootstrapped version, see the last paragraph of the paper on p.1859, the year Darwin published his Origin of Species.)

ghost [parameters] in the [Bayesian] shell

Posted in Books, Kids, Statistics with tags , , , , , , , on August 3, 2017 by xi'an

This question appeared on Stack Exchange (X Validated) two days ago. And the equalities indeed seem to suffer from several mathematical inconsistencies, as I pointed out in my Answer. However, what I find most crucial in this question is that the quantity on the left hand side is meaningless. Parameters for different models only make sense within their own model. Hence when comparing models parameters cannot co-exist across models. What I suspect [without direct access to Kruschke’s Doing Bayesian Data Analysis book and as was later confirmed by John] is that he is using pseudo-priors in order to apply Carlin and Chib (1995) resolution [by saturation of the parameter space] of simulating over a trans-dimensional space…

relativity is the keyword

Posted in Books, Statistics, University life with tags , , , , , , , on February 1, 2017 by xi'an

St John's College, Oxford, Feb. 23, 2012As I was teaching my introduction to Bayesian Statistics this morning, ending up with the chapter on tests of hypotheses, I found reflecting [out loud] on the relative nature of posterior quantities. Just like when I introduced the role of priors in Bayesian analysis the day before, I stressed the relativity of quantities coming out of the BBB [Big Bayesian Black Box], namely that whatever happens as a Bayesian procedure is to be understood, scaled, and relativised against the prior equivalent, i.e., that the reference measure or gauge is the prior. This is sort of obvious, clearly, but bringing the argument forward from the start avoids all sorts of misunderstanding and disagreement, in that it excludes the claims of absolute and certainty that may come with the production of a posterior distribution. It also removes the endless debate about the determination of the prior, by making each prior a reference on its own. With an additional possibility of calibration by simulation under the assumed model. Or an alternative. Again nothing new there, but I got rather excited by this presentation choice, as it seems to clarify the path to Bayesian modelling and avoid misapprehensions.

Further, the curious case of the Bayes factor (or of the posterior probability) could possibly be resolved most satisfactorily in this framework, as the [dreaded] dependence on the model prior probabilities then becomes a matter of relativity! Those posterior probabilities depend directly and almost linearly on the prior probabilities, but they should not be interpreted in an absolute sense as the ultimate and unique probability of the hypothesis (which anyway does not mean anything in terms of the observed experiment). In other words, this posterior probability does not need to be scaled against a U(0,1) distribution. Or against the p-value if anyone wishes to do so. By the end of the lecture, I was even wondering [not so loudly] whether or not this perspective was allowing for a resolution of the Lindley-Jeffreys paradox, as the resulting number could be set relative to the choice of the [arbitrary] normalising constant. Continue reading

a typo that went under the radar

Posted in Books, R, Statistics, University life with tags , , , , , , , on January 25, 2017 by xi'an

A chance occurrence on X validated: a question on an incomprehensible formula for Bayesian model choice: which, most unfortunately!, appeared in Bayesian Essentials with R! Eeech! It looks like one line in our LATEX file got erased and the likelihood part in the denominator altogether vanished. Apologies to all readers confused by this nonsensical formula!

Brexit as hypothesis testing

Posted in Kids, pictures, Statistics with tags , , , , , on June 26, 2016 by xi'an

last run on Clifton and Durdham Downs, Bristol, Jan. 27, 2012While I have no idea of how the results of the Brexit referendum of last Thursday will be interpreted, I am definitely worried by the possibility (and consequences) of an exit and wonder why those results should inevitably lead to Britain leaving the EU. Indeed, referenda are not legally binding in the UK and Parliament could choose to ignore the majority opinion expressed by this vote. For instance, because of the negative consequences of a withdrawal. Or because the differential is too little to justify such a dramatic change. In this, it relates to hypothesis testing in that only an overwhelming score can lead to the rejection of a natural null hypothesis corresponding to the status quo, rather than the posterior probability being above a mere ½. Which is the decision associated with a 0-1 loss function.  Of course, the analogy can be attacked from many sides, from a denial of democracy (simple majority being determined by a single extra vote) to a lack of randomness in the outcome of the referendum (since everyone in the population is supposed to have voted). But I still see some value in requiring major societal changes to be backed by more than a simple majority. All this musing is presumably wishful thinking since every side seems eager to move further (away from one another), but it would great if it could take place.

ABC model choice via random forests [and no fire]

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on September 4, 2015 by xi'an

While my arXiv newspage today had a puzzling entry about modelling UFOs sightings in France, it also broadcast our revision of Reliable ABC model choice via random forests, version that we resubmitted today to Bioinformatics after a quite thorough upgrade, the most dramatic one being the realisation we could also approximate the posterior probability of the selected model via another random forest. (With no connection with the recent post on forest fires!) As discussed a little while ago on the ‘Og. And also in conjunction with our creating the abcrf R package for running ABC model choice out of a reference table. While it has been an excruciatingly slow process (the initial version of the arXived document dates from June 2014, the PNAS submission was rejected for not being enough Bayesian, and the latest revision took the whole summer), the slow maturation of our thoughts on the model choice issues led us to modify the role of random forests in the ABC approach to model choice, in that we reverted our earlier assessment that they could only be trusted for selecting the most likely model, by realising this summer the corresponding posterior could be expressed as a posterior loss and estimated by a secondary forest. As first considered in Stoehr et al. (2014). (In retrospect, this brings an answer to one of the earlier referee’s comments.) Next goal is to incorporate those changes in DIYABC (and wait for the next version of the software to appear). Another best-selling innovation due to Arnaud: we added a practical implementation section in the format of FAQ for issues related with the calibration of the algorithms.

SPA 2015 Oxford

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on July 14, 2015 by xi'an

Today I gave a talk on Approximate Bayesian model choice via random forests at the yearly SPA (Stochastic Processes and their Applications) 2015 conference, taking place in Oxford (a nice town near Warwick) this year. In Keble College more precisely. The slides are below and while they are mostly repetitions of earlier slides, there is a not inconsequential novelty in the presentation, namely that I included our most recent and current perspective on ABC model choice. Indeed, when travelling to Montpellier two weeks ago, we realised that there was a way to solve our posterior probability conundrum!

campusDespite the heat wave that rolled all over France that week, we indeed figured out a way to estimate the posterior probability of the selected (MAP) model, way that we had deemed beyond our reach in previous versions of the talk and of the paper. The fact that we could not provide an estimate of this posterior probability and had to rely instead on a posterior expected loss was one of the arguments used by the PNAS reviewers in rejecting the paper. While the posterior expected loss remains a quantity worth approximating and reporting, the idea that stemmed from meeting together in Montpellier is that (i) the posterior probability of the MAP is actually related to another posterior loss, when conditioning on the observed summary statistics and (ii) this loss can be itself estimated via a random forest, since it is another function of the summary statistics. A posteriori, this sounds trivial but we had to have a new look at the problem to realise that using ABC samples was not the only way to produce an estimate of the posterior probability! (We are now working on the revision of the paper for resubmission within a few week… Hopefully before JSM!)