Archive for Steve Fienberg

over-confident about mis-specified models?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on April 30, 2019 by xi'an

Ziheng Yang and Tianqui Zhu published a paper in PNAS last year that criticises Bayesian posterior probabilities used in the comparison of models under misspecification as “overconfident”. The paper is written from a phylogeneticist point of view, rather than from a statistician’s perspective, as shown by the Editor in charge of the paper [although I thought that, after Steve Fienberg‘s intervention!, a statistician had to be involved in a submission relying on statistics!] a paper , but the analysis is rather problematic, at least seen through my own lenses… With no statistical novelty, apart from looking at the distribution of posterior probabilities in toy examples. The starting argument is that Bayesian model comparison is often reporting posterior probabilities in favour of a particular model that are close or even equal to 1.

“The Bayesian method is widely used to estimate species phylogenies using molecular sequence data. While it has long been noted to produce spuriously high posterior probabilities for trees or clades, the precise reasons for this over confidence are unknown. Here we characterize the behavior of Bayesian model selection when the compared models are misspecified and demonstrate that when the models are nearly equally wrong, the method exhibits unpleasant polarized behaviors,supporting one model with high confidence while rejecting others. This provides an explanation for the empirical observation of spuriously high posterior probabilities in molecular phylogenetics.”

The paper focus on the behaviour of posterior probabilities to strongly support a model against others when the sample size is large enough, “even when” all models are wrong, the argument being apparently that the correct output should be one of equal probability between models, or maybe a uniform distribution of these model probabilities over the probability simplex. Why should it be so?! The construction of the posterior probabilities is based on a meta-model that assumes the generating model to be part of a list of mutually exclusive models. It does not account for cases where “all models are wrong” or cases where “all models are right”. The reported probability is furthermore epistemic, in that it is relative to the measure defined by the prior modelling, not to a promise of a frequentist stabilisation in a ill-defined asymptotia. By which I mean that a 99.3% probability of model M¹ being “true”does not have a universal and objective meaning. (Moderation note: the high polarisation of posterior probabilities was instrumental in our investigation of model choice with ABC tools and in proposing instead error rates in ABC random forests.)

The notion that two models are equally wrong because they are both exactly at the same Kullback-Leibler distance from the generating process (when optimised over the parameter) is such a formal [or cartoonesque] notion that it does not make much sense. There is always one model that is slightly closer and eventually takes over. It is also bizarre that the argument does not account for the complexity of each model and the resulting (Occam’s razor) penalty. Even two models with a single parameter are not necessarily of intrinsic dimension one, as shown by DIC. And thus it is not a surprise if the posterior probability mostly favours one versus the other. In any case, an healthily sceptic approach to Bayesian model choice means looking at the behaviour of the procedure (Bayes factor, posterior probability, posterior predictive, mixture weight, &tc.) under various assumptions (model M¹, M², &tc.) to calibrate the numerical value, rather than taking it at face value. By which I do not mean a frequentist evaluation of this procedure. Actually, it is rather surprising that the authors of the PNAS paper do not jump on the case when the posterior probability of model M¹ say is uniformly distributed, since this would be a perfect setting when the posterior probability is a p-value. (This is also what happens to the bootstrapped version, see the last paragraph of the paper on p.1859, the year Darwin published his Origin of Species.)

RSS tribute

Posted in Statistics, University life with tags , , , , , , on November 4, 2018 by xi'an

remembering Joyce Fienberg through Steve’s words

Posted in Statistics with tags , , , , , , on October 28, 2018 by xi'an

I just learned the horrific news that Joyce Fienberg was one of the eleven people murdered yesterday morning at the Tree of Life synagogue. I had been vaguely afraid this could be the case since hearing about the shooting there, just because it was not far from the University of Pittsburgh, and CMU, but then a friend emailed me she indeed was one of the victims. When her husband Steve was on sabbatical in Paris, we met a few times for memorable dinners. I think the last time I saw her was a few years ago in a Paris hotel where Joyce, Steve and I had breakfast together to take advantage of one of their short trips to Paris. In remembrance of this wonderful woman who got assassinated by an anti-Semitic extremist, here is how Steve described their encounter in his Statistical Science interview:

I had met my wife Joyce at the University of Toronto when we were both undergraduates. I was actually working in the fall of 1963 in the registrar’s office, and on the first day the office opened to enroll people, Joyce came through. And one of the benefits about working in the registrar’s office, besides earning some spending money, was meeting all these beautiful women students passing through. That first day I made a note to ask Joyce out on a date. The next day she came through again, this time bringing through another young woman who turned out to be the daughter of friends of her parents. And I thought this was a little suspicious, but auspicious in the sense that maybe I would succeed in getting a date when I asked her. And the next day, she came through again! This time with her cousin! Then I knew that this was really going to work out. And it did. We got engaged at the end of the summer of 1964 after I graduated, but we weren’t married when I went away to graduate school. In fact, yesterday I was talking to one of the students at the University of Connecticut who was a little concerned about graduate school; it was wearing her down, and I told her I almost left after the first semester because I wasn’t sure if I was going to make a go of it, in part because I was lonely. But I did survive, and Joyce came at the end of the first year; we got married right after classes ended, and we’ve been together ever since.

Steve Fienberg’ obituary in Nature

Posted in Statistics with tags , , , , , , , , on March 10, 2017 by xi'an

“Stephen Fienberg was the ultimate public statistician.”

Robin Mejia from CMU published in the 23 Feb issue of Nature an obituary of Steve Fienberg that sums up beautifully Steve’s contributions to science and academia. I like the above quote very much, as indeed Steve was definitely involved in public policies, towards making those more rational and fair. I remember the time he came to Paris-Dauphine to give a seminar and talk on his assessment in a NAS committee on the polygraph (and my surprise at it being used at all in the US and even worse in judiciary issues). Similarly, I remember his involvement in making the US Census based on surveys rather than on an illusory exhaustive coverage of the entire US population. Including a paper in Nature about the importance of surveys. And his massive contributions to preserving privacy in surveys and databases, an issue in which he was a precursor (even though my colleagues at the French Census Bureau did not catch the opportunity when he spent a sabbatical in Paris in 2004). While it is such a sad circumstance that lead to statistics getting a rare entry in Nature, I am glad that Steve can also be remembered that way.

Stephen Fienberg (1942-2016)

Posted in Statistics, University life with tags , on December 14, 2016 by xi'an

I am very very sad to have to announce that our dear friend Steve Fienberg passed away last night, after a long and admirable battle against cancer. He was a wonderful person, a brilliant statistician, a deep thinker, and a fantastic mentor to so many of us. He has strongly impacted the field of Statistics over his prolific career and continued to do so till the last day. It is just so hard to realise he is no longer with us. But his contagious laughter will continue to resonate in our memories, while his vision of Statistics will keep driving us. Au revoir, Steve, et merci.