Archive for National Academy of Science

over-confident about mis-specified models?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on April 30, 2019 by xi'an

Ziheng Yang and Tianqui Zhu published a paper in PNAS last year that criticises Bayesian posterior probabilities used in the comparison of models under misspecification as “overconfident”. The paper is written from a phylogeneticist point of view, rather than from a statistician’s perspective, as shown by the Editor in charge of the paper [although I thought that, after Steve Fienberg‘s intervention!, a statistician had to be involved in a submission relying on statistics!] a paper , but the analysis is rather problematic, at least seen through my own lenses… With no statistical novelty, apart from looking at the distribution of posterior probabilities in toy examples. The starting argument is that Bayesian model comparison is often reporting posterior probabilities in favour of a particular model that are close or even equal to 1.

“The Bayesian method is widely used to estimate species phylogenies using molecular sequence data. While it has long been noted to produce spuriously high posterior probabilities for trees or clades, the precise reasons for this over confidence are unknown. Here we characterize the behavior of Bayesian model selection when the compared models are misspecified and demonstrate that when the models are nearly equally wrong, the method exhibits unpleasant polarized behaviors,supporting one model with high confidence while rejecting others. This provides an explanation for the empirical observation of spuriously high posterior probabilities in molecular phylogenetics.”

The paper focus on the behaviour of posterior probabilities to strongly support a model against others when the sample size is large enough, “even when” all models are wrong, the argument being apparently that the correct output should be one of equal probability between models, or maybe a uniform distribution of these model probabilities over the probability simplex. Why should it be so?! The construction of the posterior probabilities is based on a meta-model that assumes the generating model to be part of a list of mutually exclusive models. It does not account for cases where “all models are wrong” or cases where “all models are right”. The reported probability is furthermore epistemic, in that it is relative to the measure defined by the prior modelling, not to a promise of a frequentist stabilisation in a ill-defined asymptotia. By which I mean that a 99.3% probability of model M¹ being “true”does not have a universal and objective meaning. (Moderation note: the high polarisation of posterior probabilities was instrumental in our investigation of model choice with ABC tools and in proposing instead error rates in ABC random forests.)

The notion that two models are equally wrong because they are both exactly at the same Kullback-Leibler distance from the generating process (when optimised over the parameter) is such a formal [or cartoonesque] notion that it does not make much sense. There is always one model that is slightly closer and eventually takes over. It is also bizarre that the argument does not account for the complexity of each model and the resulting (Occam’s razor) penalty. Even two models with a single parameter are not necessarily of intrinsic dimension one, as shown by DIC. And thus it is not a surprise if the posterior probability mostly favours one versus the other. In any case, an healthily sceptic approach to Bayesian model choice means looking at the behaviour of the procedure (Bayes factor, posterior probability, posterior predictive, mixture weight, &tc.) under various assumptions (model M¹, M², &tc.) to calibrate the numerical value, rather than taking it at face value. By which I do not mean a frequentist evaluation of this procedure. Actually, it is rather surprising that the authors of the PNAS paper do not jump on the case when the posterior probability of model M¹ say is uniformly distributed, since this would be a perfect setting when the posterior probability is a p-value. (This is also what happens to the bootstrapped version, see the last paragraph of the paper on p.1859, the year Darwin published his Origin of Species.)

scientific societies start to address sexual harassement

Posted in Books, University life with tags , , , , , , on July 10, 2018 by xi'an

As ISBA releases a letter of her president to the members about the decision by the ISBA Board [taken in Edinburgh] to exclude three of its members following multiple complaints of harassment, the ASA publishes an update on the activities of the task force created to address this issue last November. And Nature reports on the report published by the US academies of Sciences, Engineering, and Medicine, which points out the limited impact of the current policies and mechanisms at play in US institutions.

“The analysis concludes that policies to fight the problem are ineffective because they are set up to protect institutions, not victims.” Nature, June 12, 2018

A common feature between the ASA and the Academy approaches is to rely on a survey of their respective members, soon to come for ASA members. Another feature of major relevance is the issue of anonymous reporting and counselling. So that victims and witnesses of harassment can trust the procedure strongly enough to report  a case without being afraid of being known to a large number of people. In my opinion, having identified individuals that represent the diversity of a scientific society such as ISBA, rather than an anonymous email account or a web form, is more likely to induce testimonies or complaints.

distributions for parameters [seminar]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 22, 2018 by xi'an
Next Thursday, January 25, Nancy Reid will give a seminar in Paris-Dauphine on distributions for parameters that covers different statistical paradigms and bring a new light on the foundations of statistics. (Coffee is at 10am in the Maths department common room and the talk is at 10:15 in room A, second floor.)

Nancy Reid is University Professor of Statistical Sciences and the Canada Research Chair in Statistical Theory and Applications at the University of Toronto and internationally acclaimed statistician, as well as a 2014 Fellow of the Royal Society of Canada. In 2015, she received the Order of Canada, was elected a foreign associate of the National Academy of Sciences in 2016 and has been awarded many other prestigious statistical and science honours, including the Committee of Presidents of Statistical Societies (COPSS) Award in 1992.

Nancy Reid’s research focuses on finding more accurate and efficient methods to deduce and conclude facts from complex data sets to ultimately help scientists find specific solutions to specific problems.

There is currently some renewed interest in developing distributions for parameters, often without relying on prior probability measures. Several approaches have been proposed and discussed in the literature and in a series of “Bayes, fiducial, and frequentist” workshops and meeting sessions. Confidence distributions, generalized fiducial inference, inferential models, belief functions, are some of the terms associated with these approaches.  I will survey some of this work, with particular emphasis on common elements and calibration properties. I will try to situate the discussion in the context of the current explosion of interest in big data and data science. 

and it only gets worse…

Posted in Kids, pictures with tags , , , , , , , , , , , , , , on October 6, 2017 by xi'an

“An internal Interior Department memo has proposed lifting restrictions on exploratory seismic studies in the Arctic National Wildlife Refuge, a possible first step toward opening the pristine wilderness area to oil and gas drilling.” NYT, Sept 17, 2017

“The Trump administration opened the door to allowing more firearms on federal lands. It scrubbed references to “L.G.B.T.Q. youth” from the description of a federal program for victims of sex trafficking. And, on the advice of religious leaders, it eliminated funding to international groups that provide abortion.” NYT, Sept 11, 2017

“On Aug. 18, the National Academies of Sciences, Engineering and Medicine received an order from the Interior Department that it stop work on what seemed a useful and overdue study of the health risks of mountaintop-removal coal mining.” NYT, Sept 9, 2017

“Last month the National Oceanic and Atmospheric Administration dissolved its 15-member climate science advisory committee, a panel set up to help translate the findings of the National Climate Assessment into concrete guidance for businesses, governments and the public.” NYT, Sept 9, 2017

Climate contrarians, like Trump’s EPA administrator Scott Pruitt and Energy Secretary Rick Perry, don’t understand how scientific research works. They are basically asking for a government handout to scientists to do what scientists are should already be doing. They are also requesting handouts for scientists who have been less successful in research and publications – a move antithetical to the survival of the fitness approach that has formed the scientific community for decades. ” The Guardian, Aug 31, 2017

Steve Fienberg’ obituary in Nature

Posted in Statistics with tags , , , , , , , , on March 10, 2017 by xi'an

“Stephen Fienberg was the ultimate public statistician.”

Robin Mejia from CMU published in the 23 Feb issue of Nature an obituary of Steve Fienberg that sums up beautifully Steve’s contributions to science and academia. I like the above quote very much, as indeed Steve was definitely involved in public policies, towards making those more rational and fair. I remember the time he came to Paris-Dauphine to give a seminar and talk on his assessment in a NAS committee on the polygraph (and my surprise at it being used at all in the US and even worse in judiciary issues). Similarly, I remember his involvement in making the US Census based on surveys rather than on an illusory exhaustive coverage of the entire US population. Including a paper in Nature about the importance of surveys. And his massive contributions to preserving privacy in surveys and databases, an issue in which he was a precursor (even though my colleagues at the French Census Bureau did not catch the opportunity when he spent a sabbatical in Paris in 2004). While it is such a sad circumstance that lead to statistics getting a rare entry in Nature, I am glad that Steve can also be remembered that way.