Archive for Bayesian Analysis

ISBA 2016 [logo]

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , on April 22, 2015 by xi'an

Things are starting to get in place for the next ISBA 2016 World meeting, in Forte Village Resort Convention Center, Sardinia, Italy. June 13-17, 2016. And not only the logo inspired from the nuraghe below. I am sure the program will be terrific and make this new occurrence of a “Valencia meeting” worth attending. Just like the previous occurrences, e.g. Cancún last summer and Kyoto in 2012.

However, and not for the first time, I wonder at the sustainability of such meetings when faced with always increasing—or more accurately sky-rocketing!—registration fees… We have now reached €500 per participant for the sole (early reg.) fees, excluding lodging, food or transportation. If we bet on 500 participants, this means simply renting the convention centre would cost €250,000 for the four or five days of the meeting. This sounds enormous, even accounting for the processing costs of the congress organiser. (By comparison, renting the convention centre MCMSki in Chamonix for three days was less than €20,000.) Given the likely high costs of staying at the resort, it is very unlikely I will be able to support my PhD students  As I know very well of the difficulty to find dedicated volunteers willing to offer a large fraction of their time towards the success of behemoth meetings, this comment is by no means aimed at my friends from Cagliari who kindly accepted to organise this meeting. But rather at the general state of academic meetings which costs makes them out of reach for a large part of the scientific community.

Thus, this makes me wonder anew whether we should move to a novel conference model given that the fantastic growth of the Bayesian community makes the ideal of gathering together in a single beach hotel for a week of discussions, talks, posters, and more discussions unattainable. If truly physical meetings are to perdure—and this notion is as debatable as the one about the survival of paper versions of the journals—, a new approach would be to find a few universities or sponsors able to provide one or several amphitheatres around the World and to connect all those places by teleconference. Reducing the audience size at each location would greatly the pressure to find a few huge and pricey convention centres, while dispersing the units all around would diminish travel costs as well. There could be more parallel sessions and ways could be found to share virtual poster sessions, e.g. by having avatars presenting some else’s poster. Time could be reserved for local discussions of presented papers, to be summarised later to the other locations. And so on… Obviously, something would be lost of the old camaraderie, sharing research questions and side stories, as well as gossips and wine, with friends from all over the World. And discovering new parts of the World. But the cost of meetings is already preventing some of those friends to show up. I thus think it is time we reinvent the Valencia meetings into the next generation. And move to the Valenci-e-meetings.

Bayesian propaganda?

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on April 20, 2015 by xi'an

“The question is about frequentist approach. Bayesian is admissable [sic] only by wrong definition as it starts with the assumption that the prior is the correct pre-information. James-Stein beats OLS without assumptions. If there is an admissable [sic] frequentist estimator then it will correspond to a true objective prior.”

I had a wee bit of a (minor, very minor!) communication problem on X validated, about a question on the existence of admissible estimators of the linear regression coefficient in multiple dimensions, under squared error loss. When I first replied that all Bayes estimators with finite risk were de facto admissible, I got the above reply, which clearly misses the point, and as I had edited the OP question to include more tags, the edited version was reverted with a comment about Bayesian propaganda! This is rather funny, if not hilarious, as (a) Bayes estimators are indeed admissible in the classical or frequentist sense—I actually fail to see a definition of admissibility in the Bayesian sense—and (b) the complete class theorems of Wald, Stein, and others (like Jack Kiefer, Larry Brown, and Jim Berger) come from the frequentist quest for best estimator(s). To make my point clearer, I also reproduced in my answer the Stein’s necessary and sufficient condition for admissibility from my book but it did not help, as the theorem was “too complex for [the OP] to understand”, which shows in fine the point of reading textbooks!

Bayesian computation: fore and aft

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on February 6, 2015 by xi'an

BagneuxWith my friends Peter Green (Bristol), Krzysztof Łatuszyński (Warwick) and Marcello Pereyra (Bristol), we just arXived the first version of “Bayesian computation: a perspective on the current state, and sampling backwards and forwards”, which first title was the title of this post. This is a survey of our own perspective on Bayesian computation, from what occurred in the last 25 years [a  lot!] to what could occur in the near future [a lot as well!]. Submitted to Statistics and Computing towards the special 25th anniversary issue, as announced in an earlier post.. Pulling strength and breadth from each other’s opinion, we have certainly attained more than the sum of our initial respective contributions, but we are welcoming comments about bits and pieces of importance that we miss and even more about promising new directions that are not posted in this survey. (A warning that is should go with most of my surveys is that my input in this paper will not differ by a large margin from ideas expressed here or in previous surveys.)

not Bayesian enough?!

Posted in Books, Statistics, University life with tags , , , , , , , on January 23, 2015 by xi'an

Elm tree in the park, Parc de Sceaux, Nov. 22, 2011Our random forest paper was alas rejected last week. Alas because I think the approach is a significant advance in ABC methodology when implemented for model choice, avoiding the delicate selection of summary statistics and the report of shaky posterior probability approximation. Alas also because the referees somewhat missed the point, apparently perceiving random forests as a way to project a large collection of summary statistics on a limited dimensional vector as in the Read Paper of Paul Fearnhead and Dennis Prarngle, while the central point in using random forests is the avoidance of a selection or projection of summary statistics.  They also dismissed ou approach based on the argument that the reduction in error rate brought by random forests over LDA or standard (k-nn) ABC is “marginal”, which indicates a degree of misunderstanding of what the classification error stand for in machine learning: the maximum possible gain in supervised learning with a large number of classes cannot be brought arbitrarily close to zero. Last but not least, the referees did not appreciate why we mostly cannot trust posterior probabilities produced by ABC model choice and hence why the posterior error loss is a valuable and almost inevitable machine learning alternative, dismissing the posterior expected loss as being not Bayesian enough (or at all), for “averaging over hypothetical datasets” (which is a replicate of Jeffreys‘ famous criticism of p-values)! Certainly a first time for me to be rejected based on this argument!

On the Savage Award, advices to Ph.D. candidates [guest post]

Posted in Kids, Statistics, University life with tags , , , , , on January 22, 2015 by xi'an

This blog post was contributed by my friend Julien Cornebise, as a reprint of a column he wrote for the latest ISBA Bulletin.

This article is an occasion to pay forward ever so slightly, by encouraging current Ph.D. candidates on their path, the support ISBA gave me. Four years ago, I was honored and humbled to receive the ISBA 2010 Savage Award, category Theory and Methods, for my Ph.D. dissertation defended in 2009. Looking back, I can now testify how much this brought to me both inside and outside of Academia.

Inside Academia: confirming and mitigating the widely-shared post-graduate’s impostor syndrome

Upon hearing of the great news, a brilliant multi-awarded senior researcher in my lab very kindly wrote to me that such awards meant never having to prove one’s worth again. Although genuinely touched by her congratulations, being far less accomplished and more junior than her, I felt all the more responsible to prove myself worth of this show of confidence from ISBA. It would be rather awkward to receive such an award only to fail miserably shortly after.

This resonated deeply with the shared secret of recent PhDs, discovered during my year at SAMSI, a vibrant institution where half a dozen new postdocs arrive each year: each and every one of us, fresh Ph.D.s from some of the best institutions (Cambridge, Duke, Waterloo, Paris…) secretly suffered the very same impostor syndrome. We were looking at each other’s CV/website and thinking “jeez! this guy/girl across the door is an expert of his/her field, look at all he/she has done, whereas I just barely scrape by on my own research!” – all the while putting up a convincing façade of self-assurance in front of audiences and whiteboards, to the point of apparent cockiness. Only after candid exchanges in SAMSI’s very open environment did we all discover being in the very same mindset.

In hindsight the explanation is simple: each young researcher in his/her own domain has the very expertise to measure how much he/she still does not know and has yet to learn, while he/she hears other young researchers, experts in their own other field, present results not as familiar to him/her, thus sounding so much more advanced. This take-away from SAMSI was perfectly confirmed by the Savage Award: yes, maybe indeed, I, just like my other colleagues, might actually know something relatively valuable, and my scraping by might just be not so bad – as is also the case of so many of my young colleagues.

Of course, impostor syndrome is a clingy beast and, healthily, I hope to never get entirely over it – merely overcoming it enough to say “Do not worry, thee young candidate, thy doubts pave a path well trodden”.

A similar message is also part of the little-known yet gem of a guide “How to do Research at MIT AI Lab – Emotional Factors, relevant far beyond its original lab. I recommend it to any Ph.D. student; the feedback from readers is unanimous.

Outside Academia: incredibly increased readability

After two post-docs, and curious to see what was out there in atypical paths, I took a turn out of purely academic research, first as an independent consultant, then recruited out of the blue by a start-up’s recruiter, and eventually doing my small share to help convince investors. I discovered there another facet of ISBA’s Savage Award: tremendous readability.

In Academia, the dominating metric of quality is the length of the publication list – a debate for another day.  Outside of Academia, however, not all interlocutors know how remarkable is a JRSSB Read Paper, or an oral presentation at NIPS, or a publication in Nature.

This is where international learned societies, like ISBA, come into play: the awards they bestow can serve as headline-grabbing material in a biography, easily spotted. The interlocutors do not need to be familiar with the subtleties of Bayesian Analysis. All they see is a stamp of approval from an official association of this researcher’s peers. That, in itself, is enough of a quality metric to pass the first round of contact, raise interest, and get the chance to further the conversation.

First concrete example: the recruiter who contacted me for the start-up I joined in 2011 was tasked to find profiles for an Applied position. The Savage Award on the CV grabbed his attention, even though he had no inkling what Adaptive Sequential Monte Carlo Methods were, nor if they were immediately relevant to the start-up. Passing it to the start-up’s managers, they immediately changed focus and interviewed me for their Research track instead: a profile that was not what they were looking for originally, yet stood out enough to interest them for a position they had not thought of filling via a recruiter – and indeed a unique position that I would never have thought to find this way either!

Second concrete example, years later, hard at work in this start-up’s amazing team: investors were coming for a round of technical due diligence. Venture capitals sent their best scientists-in-residence to dive deeply into the technical details of our research. Of course what matters in the end is, and forever will be, the work that is done and presented. Yet, the Savage Award was mentioned in the first line of the biography that was sent ahead of time, as a salient point to give a strong first impression of our research team.

Advices to Ph.D. Candidates: apply, you are the world best expert on your topic

That may sound trivial, but the first advice: apply. Discuss with your advisor the possibility to put your dissertation up for consideration. This might sound obvious to North-American students, whose educative system is rife with awards for high-performing students. Not so much in France, where those would be at odds with the sometimes over-present culture of égalité in the younger-age public education system. As a cultural consequence, few French Ph.D. students, even the most brilliant, would consider putting up their dissertation for consideration. I have been very lucky in that regard to benefit from the advice of a long-term Bayesian, who offered to send it for me – thanks again Xi’an! Not all students, regardless how brilliant their work, are made aware of this possibility.

The second advice, closely linked: do not underestimate the quality of your work. You are the foremost expert in the entire world on your Ph.D. topic. As discussed above, it is all too easy to see how advanced are the maths wielded by your office-mate, yet oversee the as-much-advanced maths you are juggling on a day-to-day basis, more familiar to you, and whose limitations you know better than anyone else. Actually, knowing these very limitations is what proves you are an expert.

A word of thanks and final advice

Finally, a word of thanks. I have been incredibly lucky, throughout my career so far, to meet great people. My dissertation already had four pages of acknowledgements: I doubt the Bulletin’s editor would appreciate me renewing (and extending!) them here. They are just as heartfelt today as they were then. I must, of course, add ISBA and the Savage Award committee for their support, as well as all those who, by their generous donations, allow the Savage Fund to stay alive throughout the years.

Of interest to Ph.D. candidates, though, one special mention of a dual tutelage system, that I have seen successfully at work many times. The most senior, a professor with the deep knowledge necessary to steer the project brings his endless fonts of knowledge collected over decades, wrapped in hardened tough-love. The youngest, a postdoc or fresh assistant professor, brings virtuosity, emulation and day-to-day patience. In my case they were Pr. Éric Moulines and Dr. Jimmy Olsson. That might be the final advice to a student: if you ever stumble, as many do, as I most surely did, because Ph.D. studies can be a hell of a roller-coaster to go through, reach out to the people around you and the joint set of skills they want to offer you. In combination, they can be amazing, and help you open doors that, in retrospect, can be worth all the efforts.

Julien Cornebise, Ph.D.
www.cornebise.com/julien

 

full Bayesian significance test

Posted in Books, Statistics with tags , , , , , , , , , , on December 18, 2014 by xi'an

Among the many comments (thanks!) I received when posting our Testing via mixture estimation paper came the suggestion to relate this approach to the notion of full Bayesian significance test (FBST) developed by (Julio, not Hal) Stern and Pereira, from São Paulo, Brazil. I thus had a look at this alternative and read the Bayesian Analysis paper they published in 2008, as well as a paper recently published in Logic Journal of IGPL. (I could not find what the IGPL stands for.) The central notion in these papers is the e-value, which provides the posterior probability that the posterior density is larger than the largest posterior density over the null set. This definition bothers me, first because the null set has a measure equal to zero under an absolutely continuous prior (BA, p.82). Hence the posterior density is defined in an arbitrary manner over the null set and the maximum is itself arbitrary. (An issue that invalidates my 1993 version of the Lindley-Jeffreys paradox!) And second because it considers the posterior probability of an event that does not exist a priori, being conditional on the data. This sounds in fact quite similar to Statistical Inference, Murray Aitkin’s (2009) book using a posterior distribution of the likelihood function. With the same drawback of using the data twice. And the other issues discussed in our commentary of the book. (As a side-much-on-the-side remark, the authors incidentally  forgot me when citing our 1992 Annals of Statistics paper about decision theory on accuracy estimators..!)

about the strong likelihood principle

Posted in Books, Statistics, University life with tags , , , , , , , on November 13, 2014 by xi'an

Deborah Mayo arXived a Statistical Science paper a few days ago, along with discussions by Jan Bjørnstad, Phil Dawid, Don Fraser, Michael Evans, Jan Hanning, R. Martin and C. Liu. I am very glad that this discussion paper came out and that it came out in Statistical Science, although I am rather surprised to find no discussion by Jim Berger or Robert Wolpert, and even though I still cannot entirely follow the deductive argument in the rejection of Birnbaum’s proof, just as in the earlier version in Error & Inference.  But I somehow do not feel like going again into a new debate about this critique of Birnbaum’s derivation. (Even though statements like the fact that the SLP “would preclude the use of sampling distributions” (p.227) would call for contradiction.)

“It is the imprecision in Birnbaum’s formulation that leads to a faulty impression of exactly what  is proved.” M. Evans

Indeed, at this stage, I fear that [for me] a more relevant issue is whether or not the debate does matter… At a logical cum foundational [and maybe cum historical] level, it makes perfect sense to uncover if and which if any of the myriad of Birnbaum’s likelihood Principles holds. [Although trying to uncover Birnbaum’s motives and positions over time may not be so relevant.] I think the paper and the discussions acknowledge that some version of the weak conditionality Principle does not imply some version of the strong likelihood Principle. With other logical implications remaining true. At a methodological level, I am less much less sure it matters. Each time I taught this notion, I got blank stares and incomprehension from my students, to the point I have now stopped altogether teaching the likelihood Principle in class. And most of my co-authors do not seem to care very much about it. At a purely mathematical level, I wonder if there even is ground for a debate since the notions involved can be defined in various imprecise ways, as pointed out by Michael Evans above and in his discussion. At a statistical level, sufficiency eventually is a strange notion in that it seems to make plenty of sense until one realises there is no interesting sufficiency outside exponential families. Just as there are very few parameter transforms for which unbiased estimators can be found. So I also spend very little time teaching and even less worrying about sufficiency. (As it happens, I taught the notion this morning!) At another and presumably more significant statistical level, what matters is information, e.g., conditioning means adding information (i.e., about which experiment has been used). While complex settings may prohibit the use of the entire information provided by the data, at a formal level there is no argument for not using the entire information, i.e. conditioning upon the entire data. (At a computational level, this is no longer true, witness ABC and similar limited information techniques. By the way, ABC demonstrates if needed why sampling distributions matter so much to Bayesian analysis.)

“Non-subjective Bayesians who (…) have to live with some violations of the likelihood principle (…) since their prior probability distributions are influenced by the sampling distribution.” D. Mayo (p.229)

In the end, the fact that the prior may depend on the form of the sampling distribution and hence does violate the likelihood Principle does not worry me so much. In most models I consider, the parameters are endogenous to those sampling distributions and do not live an ethereal existence independently from the model: they are substantiated and calibrated by the model itself, which makes the discussion about the LP rather vacuous. See, e.g., the coefficients of a linear model. In complex models, or in large datasets, it is even impossible to handle the whole data or the whole model and proxies have to be used instead, making worries about the structure of the (original) likelihood vacuous. I think we have now reached a stage of statistical inference where models are no longer accepted as ideal truth and where approximation is the hard reality, imposed by the massive amounts of data relentlessly calling for immediate processing. Hence, where the self-validation or invalidation of such approximations in terms of predictive performances is the relevant issue. Provided we can at all face the challenge…

Follow

Get every new post delivered to your Inbox.

Join 820 other followers