Archive for frequentist inference

Bayes’ Theorem in the 21st Century, really?!

Posted in Books, Statistics with tags , , , , , , on June 20, 2013 by xi'an

“In place of past experience, frequentism considers future behavior: an optimal estimator is one that performs best in hypothetical repetitions of the current experiment. The resulting gain in scientific objectivity has carried the day…”

Julien Cornebise sent me this Science column by Brad Efron about Bayes’ theorem. I am a tad surprised that it got published in the journal, given that it does not really contain any new item of information. However, being unfamiliar with Science, it may also be that it also publishes major scientists’ opinions or warnings, a label that can fit this column in Science. (It is quite a proper coincidence that the post appears during Bayes 250.)

Efron’s piece centres upon the use of objective Bayes approaches in Bayesian statistics, for which Laplace was “the prime violator”. He argues through examples that noninformative “Bayesian calculations cannot be uncritically accepted, and should be checked by other methods, which usually means “frequentistically”. First, having to write “frequentistically” once is already more than I can stand! Second, using the Bayesian framework to build frequentist procedures is like buying top technical outdoor gear to climb the stairs at the Sacré-Coeur on Butte Montmartre! The naïve reader is then left clueless as to why one should use a Bayesian approach in the first place. And perfectly confused about the meaning of objectivity. Esp. given the above quote! I find it rather surprising that this old saw of a  claim of frequentism to objectivity resurfaces there. There is an infinite range of frequentist procedures and, while some are more optimal than others, none is “the” optimal one (except for the most baked-out examples like say the estimation of the mean of a normal observation).

“A Bayesian FDA (there isn’t one) would be more forgiving. The Bayesian posterior probability of drug A’s superiority depends only on its final evaluation, not whether there might have been earlier decisions.”

The second criticism of Bayesianism therein is the counter-intuitive irrelevance of stopping rules. Once again, the presentation is fairly biased, because a Bayesian approach opposes scenarii rather than evaluates the likelihood of a tail event under the null and only the null. And also because, as shown by Jim Berger and co-authors, the Bayesian approach is generally much more favorable to the null than the p-value.

“Bayes’ Theorem is an algorithm for combining prior experience with current evidence. Followers of Nate Silver’s FiveThirtyEight column got to see it in spectacular form during the presidential campaign: the algorithm updated prior poll results with new data on a daily basis, nailing the actual vote in all 50 states.”

It is only fair that Nate Silver’s book and column are mentioned in Efron’s column. Because it is a highly valuable and definitely convincing illustration of Bayesian principles. What I object to is the criticism “that most cutting-edge science doesn’t enjoy FiveThirtyEight-level background information”. In my understanding, the poll model of FiveThirtyEight built up in a sequential manner a weight system over the different polling companies, hence learning from the data if in a Bayesian manner about their reliability (rather than forgetting the past). This is actually what caused Larry Wasserman to consider that Silver’s approach was actually more frequentist than Bayesian…

“Empirical Bayes is an exciting new statistical idea, well-suited to modern scientific technology, saying that experiments involving large numbers of parallel situations carry within them their own prior distribution.”

My last point of contention is about the (unsurprising) defence of the empirical Bayes approach in the Science column. Once again, the presentation is biased towards frequentism: in the FDR gene example, the empirical Bayes procedure is motivated by being the frequentist solution. The logical contradiction in “estimat[ing] the relevant prior from the data itself” is not discussed and the conclusion that Brad Efron uses “empirical Bayes methods in the parallel case [in the absence of prior information”, seemingly without being cautious and “uncritically”, does not strike me as the proper last argument in the matter! Nor does it give a 21st Century vision of what nouveau Bayesianism should be, faced with the challenges of Big Data and the like…

the anti-Bayesian moment and its passing commented

Posted in Books, Statistics, University life with tags , , , , on March 12, 2013 by xi'an

Here is a comment on our rejoinder “the anti-Bayesian moment and its passing” with Andrew Gelman from Deborah Mayo, comment that could not make it through as a comment:

You assume that I am interested in long-term average properties of procedures, even though I have so often argued that they are at most necessary (as consequences of good procedures), but scarcely sufficient for a severity assessment. The error statistical account I have developed is a statistical philosophy. It is not one to be found in Neyman and Pearson, jointly or separately, except in occasional glimpses here and there (unfortunately). It is certainly not about well-defined accept-reject rules. If N-P had only been clearer, and Fisher better behaved, we would not have had decades of wrangling. However, I have argued, the error statistical philosophy explicates, and directs the interpretation of, frequentist sampling theory methods in scientific, as opposed to behavioural, contexts. It is not a complete philosophy…but I think Gelmanian Bayesians could find in it a source of “standard setting”.

You say “the prior is both a probabilistic object, standard from this perspective, and a subjective construct, translating qualitative personal assessments into a probability distribution. The extension of this dual nature to the so-called “conventional” priors (a very good semantic finding!) is to set a reference … against which to test the impact of one’s prior choices and the variability of the resulting inference. …they simply set a standard against which to gauge our answers.”

I think there are standards for even an approximate meaning of “standard-setting” in science, and I still do not see how an object whose meaning and rationale may fluctuate wildly, even in a given example, can serve as a standard or reference. For what?

Perhaps the idea is that one can gauge how different priors change the posteriors, because, after all, the likelihood is well-defined. That is why the prior and not the likelihood is the camel. But it isn’t obvious why I should want the camel. (camel/gnat references in the paper and response).

rise of the B word

Posted in Statistics with tags , , , on February 26, 2013 by xi'an

comparison of the uses of the words Bayesian, maximum likelihood, and frequentist, using Google NgramWhile preparing a book chapter, I checked on Google Ngram viewer the comparative uses of the words Bayesian (blue), maximum likelihood (red) and frequentist (yellow), producing the above (screen-copy quality, I am afraid!). It shows an increase of the use of the B word from the early 80′s and not the sudden rise in the 90′s I was expecting. The inclusion of “frequentist” is definitely in the joking mode, as this is not a qualification used by frequentists to describe their methods. In other words (!), “frequentist” does not occur very often in frequentist papers (and not as often as in Bayesian papers!)…

Robins and Wasserman

Posted in Statistics, Travel, University life with tags , , , , on January 17, 2013 by xi'an

entrance to Banaras Hindu University, with the huge ISBA poster, Varanasi, Jan. 10, 2013As I attended Jamie Robins’ session in Varanasi and did not have a clear enough idea of the Robbins and Wasserman paradox to discuss it viva vocce, here are my thoughts after reading Larry’s summary. My first reaction was to question whether or not this was a Bayesian statistical problem (meaning why should I be concered with the problem). Just as the normalising constant problem was not a statistical problem. We are estimating an integral given some censored realisations of a binomial depending on a covariate through an unknown function θ(x). There is not much of a parameter. However, the way Jamie presented it thru clinical trials made the problem sound definitely statistical. So end of the silly objection. My second step is to consider the very point of estimating the entire function (or infinite dimensional parameter) θ(x) when only the integral ψ is of interest. This is presumably the reason why the Bayesian approach fails as it sounds difficult to consistently estimate θ(x) under censored binomial observations, while ψ can be. Of course, if we want to estimate the probability of a success like ψ going through functional estimation this sounds like overshooting. But the Bayesian modelling of the problem appears to require considering all unknowns at once, including the function θ(x) and cannot forget about it. We encountered a somewhat similar problem with Jean-Michel Marin when working on the k-nearest neighbour classification problem. Considering all the points in the testing sample altogether as unknowns would dwarf the training sample and its information content to produce very poor inference. And so we ended up dealing with one point at a time after harsh and intense discussions! Now, back to the Robins and Wasserman paradox, I see no problem in acknowledging a classical Bayesian approach cannot produce a convergent estimate of the integral ψ. Simply because the classical Bayesian approach is an holistic system that cannot remove information to process a subset of the original problem. Call it the curse of marginalisation. Now, on a practical basis, would there be ways of running simulations of the missing Y’s when π(x) is known in order to achieve estimates of ψ? Presumably, but they would end up with a frequentist validation…

Bayes on the radio

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on November 10, 2012 by xi'an

In relation with the special issue of Science & Vie on Bayes’ formula, the French national radio (France Culture) organised a round table with Pierre Bessière, senior researcher in physiology at Collège de France, Dirk Zerwas, senior researcher in particle physics in Orsay, and Hervé Poirier, editor of Science & Vie. And myself (as I was quoted in the original paper). While I am not particularly fluent in oral debates, I was interested by participating in this radio experiment, if only to bring some moderation to the hyperbolic tone found in the special issue. (As the theme was “Is there a universal mathematical formula? “, I was for a while confused about the debate, thinking that maybe the previous blogs on Stewart’s 17 Equations and Mackenzie’s Universe in Zero Words had prompted this invitation…)

As it happened [podcast link], the debate was quite moderate and reasonable, we discussed about the genesis, the dark ages, and the resurgimento of Bayesian statistics within statistics, the lack of Bayesian perspectives in the Higgs boson analysis (bemoaned by Tony O’Hagan and Dennis Lindley), and the Bayesian nature of learning in psychology. Although I managed to mention Poincaré’s Bayesian defence of Dreyfus (thanks to the Theory that would not die!), Nate Silver‘s Bayesian combination of survey results, and the role of the MRC in the MCMC revolution, I found that the information content of a one-hour show was in the end quite limited, as I would have liked to mention as well the role of Bayesian techniques in population genetic advances, like the Asian beetle invasion mentioned two weeks ago… Overall, an interesting experience, maybe not with a huge impact on the population of listeners, and a confirmation I’d better stick to the written world!

Follow

Get every new post delivered to your Inbox.

Join 557 other followers