## Bayes’ Theorem in the 21st Century, really?!

“In place of past experience, frequentism considers future behavior: an optimal estimator is one that performs best in hypothetical repetitions of the current experiment. The resulting gain in scientific objectivity has carried the day…”

**J**ulien Cornebise sent me this Science column by Brad Efron about Bayes’ theorem. I am a tad surprised that it got published in the journal, given that it does not really contain any new item of information. However, being unfamiliar with Science, it may also be that it also publishes major scientists’ opinions or warnings, a label that can fit this column in Science. (It is quite a proper coincidence that the post appears during Bayes 250.)

**E**fron’s piece centres upon the use of objective Bayes approaches in Bayesian statistics, for which Laplace was “the prime violator”. He argues through examples that noninformative “Bayesian calculations cannot be uncritically accepted, and should be checked by other methods, which usually means “frequentistically”. First, having to write “frequentistically” once is already more than I can stand! Second, using the Bayesian framework to build frequentist procedures is like buying top technical outdoor gear to climb the stairs at the Sacré-Coeur on Butte Montmartre! The naïve reader is then left clueless as to why one should use a Bayesian approach in the first place. And perfectly confused about the meaning of objectivity. Esp. given the above quote! I find it rather surprising that this old saw of a claim of frequentism to objectivity resurfaces there. There is an infinite range of frequentist procedures and, while some are more optimal than others, none is “the” optimal one (except for the most baked-out examples like say the estimation of the mean of a normal observation).

“A Bayesian FDA (there isn’t one) would be more forgiving. The Bayesian posterior probability of drug A’s superiority depends only on its final evaluation, not whether there might have been earlier decisions.”

**T**he second criticism of Bayesianism therein is the counter-intuitive irrelevance of stopping rules. Once again, the presentation is fairly biased, because a Bayesian approach opposes scenarii rather than evaluates the likelihood of a tail event under the null and only the null. And also because, as shown by Jim Berger and co-authors, the Bayesian approach is generally much more favorable to the null than the p-value.

“Bayes’ Theorem is an algorithm for combining prior experience with current evidence. Followers of Nate Silver’s FiveThirtyEight column got to see it in spectacular form during the presidential campaign: the algorithm updated prior poll results with new data on a daily basis, nailing the actual vote in all 50 states.”

**I**t is only fair that Nate Silver’s book and column are mentioned in Efron’s column. Because it is a highly valuable and definitely convincing illustration of Bayesian principles. What I object to is the criticism “that most cutting-edge science doesn’t enjoy FiveThirtyEight-level background information”. In my understanding, the poll model of FiveThirtyEight built up in a sequential manner a weight system over the different polling companies, hence learning from the data if in a Bayesian manner about their reliability (rather than forgetting the past). This is actually what caused Larry Wasserman to consider that Silver’s approach was actually more frequentist than Bayesian…

“Empirical Bayes is an exciting new statistical idea, well-suited to modern scientific technology, saying that experiments involving large numbers of parallel situations carry within them their own prior distribution.”

**M**y last point of contention is about the (unsurprising) defence of the empirical Bayes approach in the Science column. Once again, the presentation is biased towards frequentism: in the FDR gene example, the empirical Bayes procedure is motivated by being the frequentist solution. The logical contradiction in “estimat[ing] the relevant prior from the data itself” is not discussed and the conclusion that Brad Efron uses “empirical Bayes methods in the parallel case [in the absence of prior information”, seemingly without being cautious and “uncritically”, does not strike me as the proper last argument in the matter! Nor does it give a 21st Century vision of what nouveau Bayesianism should be, faced with the challenges of Big Data and the like…

December 8, 2013 at 1:29 pm

I don’t think frequentism is objective in the sense of providing an optimal-for-everyone procedure, but it does consider objective properties of any given procedure. To the extent that a Bayesian can interpret sampling-type probability at all, looking at, say, the coverage properties of a credible interval (given a particular prior) seems perfectly valid, objective, and unobjectionable to me.

December 8, 2013 at 1:48 pm

thanks!, I have difficulties in putting a mathematical definition upon “objective” (the ultimate paradox for the president-elect of the Objective Bayes session of ISBA, isnt’it?!)

December 8, 2013 at 2:09 pm

Fair point! I guess I’m also saying that frequentist properties of Bayesian inferences could (justifiedly) be interesting to the Bayesian.

June 25, 2013 at 9:09 pm

I normally like Brad’s writing but I found this a bit odd:

“What prior evidence are we using? None, as it turns out! With 6033 parallel situations at hand, we can effectively estimate the relevant prior from the data itself”

Then goes on to poo-poo on uninformative priors when the empirical bayes approach is implicitly using an uninformative hyperprior. I do think that multilevel modeling / empirical bayes is often the right approach in these high dimensional estimation scenarios, but at the end of the day, it’s still just a bayesian model. One is still making a decision about the characteristics of the hyperprior, not to mention the functional form of the prior distribution and other aspects of the model.

I’m also perplexed that he doesn’t critcize this idea of “optimal estimators”, since he did so much work with stein estimation. One of my takeaways from stein’s paradox is that one form of optimality (e.g. unbiasedness) comes at the cost of other forms of optimality. In applied scenarios the price one pays for “optimal” estimators (e.g., overestimation of statistically significant effects) often makes them highly suspect to use in practice.

June 20, 2013 at 1:21 pm

The quip about Sacré-Coeur made me laugh….

I cannot understand how this is interesting enough for a science column. Anyone who’s done statistics for more than a short time can probably do a “one man play” version of the argument at the drop of a hat. It’s always the same!

When it comes down to it, both work when the data has information, neither work (or both work depending on your point of view) when it doesn’t. It always comes down to make assumptions, ride assumptions, check assumptions, repeat.

Incidentally, when I was in UC Santa Cruz, David Draper showed me an old book on inference that recommended using the unbiased estimator, but often negative, estimator of a variance (I’m failing to remember the context, but it was fairly classical) because unbiasedness was judged to be more important than positivity.