Archive for PNAS

over-confident about mis-specified models?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on April 30, 2019 by xi'an

Ziheng Yang and Tianqui Zhu published a paper in PNAS last year that criticises Bayesian posterior probabilities used in the comparison of models under misspecification as “overconfident”. The paper is written from a phylogeneticist point of view, rather than from a statistician’s perspective, as shown by the Editor in charge of the paper [although I thought that, after Steve Fienberg‘s intervention!, a statistician had to be involved in a submission relying on statistics!] a paper , but the analysis is rather problematic, at least seen through my own lenses… With no statistical novelty, apart from looking at the distribution of posterior probabilities in toy examples. The starting argument is that Bayesian model comparison is often reporting posterior probabilities in favour of a particular model that are close or even equal to 1.

“The Bayesian method is widely used to estimate species phylogenies using molecular sequence data. While it has long been noted to produce spuriously high posterior probabilities for trees or clades, the precise reasons for this over confidence are unknown. Here we characterize the behavior of Bayesian model selection when the compared models are misspecified and demonstrate that when the models are nearly equally wrong, the method exhibits unpleasant polarized behaviors,supporting one model with high confidence while rejecting others. This provides an explanation for the empirical observation of spuriously high posterior probabilities in molecular phylogenetics.”

The paper focus on the behaviour of posterior probabilities to strongly support a model against others when the sample size is large enough, “even when” all models are wrong, the argument being apparently that the correct output should be one of equal probability between models, or maybe a uniform distribution of these model probabilities over the probability simplex. Why should it be so?! The construction of the posterior probabilities is based on a meta-model that assumes the generating model to be part of a list of mutually exclusive models. It does not account for cases where “all models are wrong” or cases where “all models are right”. The reported probability is furthermore epistemic, in that it is relative to the measure defined by the prior modelling, not to a promise of a frequentist stabilisation in a ill-defined asymptotia. By which I mean that a 99.3% probability of model M¹ being “true”does not have a universal and objective meaning. (Moderation note: the high polarisation of posterior probabilities was instrumental in our investigation of model choice with ABC tools and in proposing instead error rates in ABC random forests.)

The notion that two models are equally wrong because they are both exactly at the same Kullback-Leibler distance from the generating process (when optimised over the parameter) is such a formal [or cartoonesque] notion that it does not make much sense. There is always one model that is slightly closer and eventually takes over. It is also bizarre that the argument does not account for the complexity of each model and the resulting (Occam’s razor) penalty. Even two models with a single parameter are not necessarily of intrinsic dimension one, as shown by DIC. And thus it is not a surprise if the posterior probability mostly favours one versus the other. In any case, an healthily sceptic approach to Bayesian model choice means looking at the behaviour of the procedure (Bayes factor, posterior probability, posterior predictive, mixture weight, &tc.) under various assumptions (model M¹, M², &tc.) to calibrate the numerical value, rather than taking it at face value. By which I do not mean a frequentist evaluation of this procedure. Actually, it is rather surprising that the authors of the PNAS paper do not jump on the case when the posterior probability of model M¹ say is uniformly distributed, since this would be a perfect setting when the posterior probability is a p-value. (This is also what happens to the bootstrapped version, see the last paragraph of the paper on p.1859, the year Darwin published his Origin of Species.)

noninformative Bayesian prior with a finite support

Posted in Statistics, University life with tags , , , , , , on December 4, 2018 by xi'an

A few days ago, Pierre Jacob pointed me to a PNAS paper published earlier this year on a form of noninformative Bayesian analysis by Henri Mattingly and coauthors. They consider a prior that “maximizes the mutual information between parameters and predictions”, which sounds very much like José Bernardo’s notion of reference priors. With the rather strange twist of having the prior depending on the data size m even they work under an iid assumption. Here information is defined as the difference between the entropy of the prior and the conditional entropy which is not precisely defined in the paper but looks like the expected [in the data x] Kullback-Leibler divergence between prior and posterior. (I have general issues with the paper in that I often find it hard to read for a lack of precision and of definition of the main notions.)

One highly specific (and puzzling to me) feature of the proposed priors is that they are supported by a finite number of atoms, which reminds me very much of the (minimax) least favourable priors over compact parameter spaces, as for instance in the iconic paper by Casella and Strawderman (1984). For the same mathematical reason that non-constant analytic functions must have separated maxima. This is conducted under the assumption and restriction of a compact parameter space, which must be chosen in most cases. somewhat arbitrarily and not without consequences. I can somehow relate to the notion that a finite support prior translates the limited precision in the estimation brought by a finite sample. In other words, given a sample size of m, there is a maximal precision one can hope for, producing further decimals being silly. Still, the fact that the support of the prior is fixed a priori, completely independently of the data, is both unavoidable (for the prior to be prior!) and very dependent on the choice of the compact set. I would certainly prefer to see a maximal degree of precision expressed a posteriori, meaning that the support would then depend on the data. And handling finite support posteriors is rather awkward in that many notions like confidence intervals do not make much sense in that setup. (Similarly, one could argue that Bayesian non-parametric procedures lead to estimates with a finite number of support points but these are determined based on the data, not a priori.)

Interestingly, the derivation of the “optimal” prior is operated by iterations where the next prior is the renormalised version of the current prior times the exponentiated Kullback-Leibler divergence, which is “guaranteed to converge to the global maximum” for a discretised parameter space. The authors acknowledge that the resolution is poorly suited to multidimensional settings and hence to complex models, and indeed the paper only covers a few toy examples of moderate and even humble dimensions.

Another difficulty with the paper is the absence of temporal consistency: since the prior depends on the sample size, the posterior for n i.i.d. observations is no longer the prior for the (n+1)th observation.

“Because it weights the irrelevant parameter volume, the Jeffreys prior has strong dependence on microscopic effects invisible to experiment”

I simply do not understand the above sentence that apparently counts as a criticism of Jeffreys (1939). And would appreciate anyone enlightening me! The paper goes into comparing priors through Bayes factors, which ignores the main difficulty of an automated solution such as Jeffreys priors in its inability to handle infinite parameter spaces by being almost invariably improper.

Bayes for good

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on November 27, 2018 by xi'an

A very special weekend workshop on Bayesian techniques used for social good in many different sense (and talks) that we organised with Kerrie Mengersen and Pierre Pudlo at CiRM, Luminy, Marseilles. It started with Rebecca (Beka) Steorts (Duke) explaining [by video from Duke] how the Syrian war deaths were processed to eliminate duplicates, to be continued on Monday at the “Big” conference, Alex Volfonsky (Duke) on a Twitter experiment on the impact of being exposed to adverse opinions as depolarising (not!) or further polarising (yes), turning into network causal analysis. And then Kerrie Mengersen (QUT) on the use of Bayesian networks in ecology, through observational studies she conducted. And the role of neutral statisticians in case of adversarial experts!

Next day, the first talk of David Corlis (Peace-Work), who writes the Stats for Good column in CHANCE and here gave a recruiting spiel for volunteering in good initiatives. Quoting Florence Nightingale as the “first” volunteer. And presenting a broad collection of projects as supports to his recommendations for “doing good”. We then heard [by video] Julien Cornebise from Element AI in London telling of his move out of DeepMind towards investing in social impacting projects through this new startup. Including working with Amnesty International on Darfour village destructions, building evidence from satellite imaging. And crowdsourcing. With an incoming report on the year activities (still under embargo). A most exciting and enthusiastic talk!

Continue reading

how many academics does it take to change… a p-value threshold?

Posted in Books, pictures, Running, Statistics, Travel with tags , , , , , , , , on August 22, 2017 by xi'an

“…a critical mass of researchers now endorse this change.”

The answer to the lightpulp question seems to be 72: Andrew sent me a short paper recently PsyarXived and to appear in Nature Human Behaviour following on the .005 not .05 tune we criticised in PNAS a while ago. (Actually a very short paper once the names and affiliations of all authors are taken away.) With indeed 72 authors, many of them my Bayesian friends! I figure the mass signature is aimed at convincing users of p-values of a consensus among statisticians. Or a “critical mass” as stated in the note. On the next week, Nature had an entry on this proposal. (With a survey on whether the p-value threshold should change!)

The argument therein [and hence my reservations] is about the same as in Val Johnson’s original PNAS paper, namely that .005 should become the reference cutoff when using p-values for discovering new effects. The tone of the note is mostly Bayesian in that it defends the Bayes factor as a better alternative I would call the b-value. And produces graphs that relate p-values to some minimax Bayes factors. In the simplest possible case of testing for the nullity of a normal mean. Which I do not think is particularly convincing when considering more realistic settings with (many) nuisance parameters and possible latent variables where numerical answers diverge between p-values and [an infinity of] b-values. And of course the unsolved issue of scaling the Bayes factor. (This without embarking anew upon a full-fledged criticism of the Bayes factor.) As usual, I am also skeptical of mentions of power, since I never truly understood the point of power, which depends on the alternative model, increasingly so with the complexity of this alternative. As argued in our letter to PNAS, the central issue that this proposal fails to address is the urgency in abandoning the notion [indoctrinated in generations of students that a single quantity and a single bound are the answers to testing issues. Changing the bound sounds like suggesting to paint afresh a building on the verge of collapsing.

contemporary issues in hypothesis testing

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , on September 26, 2016 by xi'an

hipocontemptThis week [at Warwick], among other things, I attended the CRiSM workshop on hypothesis testing, giving the same talk as at ISBA last June. There was a most interesting and unusual talk by Nick Chater (from Warwick) about the psychological aspects of hypothesis testing, namely about the unnatural features of an hypothesis in everyday life, i.e., how far this formalism stands from human psychological functioning.  Or what we know about it. And then my Warwick colleague Tom Nichols explained how his recent work on permutation tests for fMRIs, published in PNAS, testing hypotheses on what should be null if real data and getting a high rate of false positives, got the medical imaging community all up in arms due to over-simplified reports in the media questioning the validity of 15 years of research on fMRI and the related 40,000 papers! For instance, some of the headings questioned the entire research in the area. Or transformed a software bug missing the boundary effects into a major flaw.  (See this podcast on Not So Standard Deviations for a thoughtful discussion on the issue.) One conclusion of this story is to be wary of assertions when submitting a hot story to journals with a substantial non-scientific readership! The afternoon talks were equally exciting, with Andrew explaining to us live from New York why he hates hypothesis testing and prefers model building. With the birthday model as an example. And David Draper gave an encompassing talk about the distinctions between inference and decision, proposing a Jaynes information criterion and illustrating it on Mendel‘s historical [and massaged!] pea dataset. The next morning, Jim Berger gave an overview on the frequentist properties of the Bayes factor, with in particular a novel [to me] upper bound on the Bayes factor associated with a p-value (Sellke, Bayarri and Berger, 2001)

B¹⁰(p) ≤ 1/-e p log p

with the specificity that B¹⁰(p) is not testing the original hypothesis [problem] but a substitute where the null is the hypothesis that p is uniformly distributed, versus a non-parametric alternative that p is more concentrated near zero. This reminded me of our PNAS paper on the impact of summary statistics upon Bayes factors. And of some forgotten reference studying Bayesian inference based solely on the p-value… It is too bad I had to rush back to Paris, as this made me miss the last talks of this fantastic workshop centred on maybe the most important aspect of statistics!

ABC model choice via random forests accepted!

Posted in Books, pictures, Statistics, University life with tags , , , , , on October 21, 2015 by xi'an

treerise6“This revision represents a very nice response to the earlier round of reviews, including a significant extension in which the posterior probability of the selected model is now estimated (whereas previously this was not included). The extension is a very nice one, and I am happy to see it included.” Anonymous

Great news [at least for us], our paper on ABC model choice has been accepted by Bioninformatics! With the pleasant comment above from one anonymous referee. This occurs after quite a prolonged gestation, which actually contributed to a shift in our understanding and our implementation of the method. I am still a wee bit unhappy at the rejection by PNAS, but it paradoxically led to a more elaborate article. So all is well that ends well! Except the story is not finished and we have still exploring the multiple usages of random forests in ABC.

ABC model choice via random forests [and no fire]

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on September 4, 2015 by xi'an

While my arXiv newspage today had a puzzling entry about modelling UFOs sightings in France, it also broadcast our revision of Reliable ABC model choice via random forests, version that we resubmitted today to Bioinformatics after a quite thorough upgrade, the most dramatic one being the realisation we could also approximate the posterior probability of the selected model via another random forest. (With no connection with the recent post on forest fires!) As discussed a little while ago on the ‘Og. And also in conjunction with our creating the abcrf R package for running ABC model choice out of a reference table. While it has been an excruciatingly slow process (the initial version of the arXived document dates from June 2014, the PNAS submission was rejected for not being enough Bayesian, and the latest revision took the whole summer), the slow maturation of our thoughts on the model choice issues led us to modify the role of random forests in the ABC approach to model choice, in that we reverted our earlier assessment that they could only be trusted for selecting the most likely model, by realising this summer the corresponding posterior could be expressed as a posterior loss and estimated by a secondary forest. As first considered in Stoehr et al. (2014). (In retrospect, this brings an answer to one of the earlier referee’s comments.) Next goal is to incorporate those changes in DIYABC (and wait for the next version of the software to appear). Another best-selling innovation due to Arnaud: we added a practical implementation section in the format of FAQ for issues related with the calibration of the algorithms.