Archive for Brad Efron

revisiting marginalisation paradoxes [Bayesian reads #1]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on February 8, 2019 by xi'an

As a reading suggestion for my (last) OxWaSP Bayesian course at Oxford, I included the classic 1973 Marginalisation paradoxes by Phil Dawid, Mervyn Stone [whom I met when visiting UCL in 1992 since he was sharing an office with my friend Costas Goutis], and Jim Zidek. Paper that also appears in my (recent) slides as an exercise. And has been discussed many times on this  ‘Og.

Reading the paper in the train to Oxford was quite pleasant, with a few discoveries like an interesting pike at Fraser’s structural (crypto-fiducial?!) distributions that “do not need Bayesian improper priors to fall into the same paradoxes”. And a most fascinating if surprising inclusion of the Box-Müller random generator in an argument, something of a precursor to perfect sampling (?). And a clear declaration that (right-Haar) invariant priors are at the source of the resolution of the paradox. With a much less clear notion of “un-Bayesian priors” as those leading to a paradox. Especially when the authors exhibit a red herring where the paradox cannot disappear, no matter what the prior is. Rich discussion (with none of the current 400 word length constraint), including the suggestion of neutral points, namely those that do identify a posterior, whatever that means. Funny conclusion, as well:

“In Stone and Dawid’s Biometrika paper, B1 promised never to use improper priors again. That resolution was short-lived and let us hope that these two blinkered Bayesians will find a way out of their present confusion and make another comeback.” D.J. Bartholomew (LSE)

and another

“An eminent Oxford statistician with decidedly mathematical inclinations once remarked to me that he was in favour of Bayesian theory because it made statisticians learn about Haar measure.” A.D. McLaren (Glasgow)

and yet another

“The fundamentals of statistical inference lie beneath a sea of mathematics and scientific opinion that is polluted with red herrings, not all spawned by Bayesians of course.” G.N. Wilkinson (Rothamsted Station)

Lindley’s discussion is more serious if not unkind. Dennis Lindley essentially follows the lead of the authors to conclude that “improper priors must go”. To the point of retracting what was written in his book! Although concluding about the consequences for standard statistics, since they allow for admissible procedures that are associated with improper priors. If the later must go, the former must go as well!!! (A bit of sophistry involved in this argument…) Efron’s point is more constructive in this regard since he recalls the dangers of using proper priors with huge variance. And the little hope one can hold about having a prior that is uninformative in every dimension. (A point much more blatantly expressed by Dickey mocking “magic unique prior distributions”.) And Dempster points out even more clearly that the fundamental difficulty with these paradoxes is that the prior marginal does not exist. Don Fraser may be the most brutal discussant of all, stating that the paradoxes are not new and that “the conclusions are erroneous or unfounded”. Also complaining about Lindley’s review of his book [suggesting prior integration could save the day] in Biometrika, where he was not allowed a rejoinder. It reflects on the then intense opposition between Bayesians and fiducialist Fisherians. (Funny enough, given the place of these marginalisation paradoxes in his book, I was mistakenly convinced that Jaynes was one of the discussants of this historical paper. He is mentioned in the reply by the authors.)

bootstrap in Nature

Posted in Statistics with tags , , , , , , , , , , on December 29, 2018 by xi'an

A news item in the latest issue of Nature I received about Brad Efron winning the “Nobel Prize of Statistics” this year. The bootstrap is certainly an invention worth the recognition, not to mention Efron’s contribution to empirical Bayes analysis,, even though I remain overall reserved about the very notion of a Nobel prize in any field… With an appropriate XXL quote, who called the bootstrap method the ‘best statistical pain reliever ever produced’!

double yolk priors [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , on March 14, 2018 by xi'an

[Here is an email I received from Subhadeep Mukhopadhyay, one of the authors of the paper I discussed yesterday.}
Thank for discussing our work. Let me clarify the technical point that you raised:
– The difference between Legj(u)_j and Tj=Legj(G(θ)). One is orthonormal polyn of L2[0,1] and the other one is L2[G]. The second one is poly of rank-transform G(θ).
– As you correctly pointed out there is a danger in directly approximating the ratio. We work on it after taking the quantile transform: evaluate the ratio at g⁻¹(θ), which is the d(u;G,F) over unit interval. Now, this new transformed function is a proper density.
-Thus the ratio now becomes d(G(θ)) which can be expended into (NOT in Leg-basis) in T_j, in eq (2.2), as it lives in the Hilbert space L2(G)
– For your last point on Step 2 of our algo, we can also use the simple integrate command.
-Unlike traditional prior-data conflict here we attempted to answer three questions in one-shot: (i) How compatible is the pre-selected g with the given data? (ii) In the event of a conflict, can we also inform the user on the nature of misfit–finer structure that was a priori unanticipated? (iii) Finally, we would like to provide a simple, yet formal guideline for upgrading (repairing) the starting g.
Hopefully, this will clear the air. But thanks for reading the paper so carefully. Appreciate it.

double yolk priors

Posted in Statistics with tags , , , , on March 13, 2018 by xi'an

“To develop a “defendable and defensible” Bayesian learning model, we have to go beyond blindly ‘turning the crank’ based on a “go-as-you-like” [approximate guess] prior. A lackluster attitude towards prior modeling could lead to disastrous inference, impacting various fields from clinical drug development to presidential election forecasts. The real questions are: How can we uncover the blind spots of the conventional wisdom-based prior? How can we develop the science of prior model-building that combines both data and science [DS-prior] in a testable manner – a double-yolk Bayesian egg?”

I came through R bloggers on this presentation of a paper by Subhadeep Mukhopadhyay and Douglas Fletcher, Bayesian modelling via goodness of fit, that aims at solving all existing problems with classical Bayesian solutions, apparently! (With also apparently no awareness of David Spiegelhalter’s take on the matter.) As illustrated by both quotes, above and below:

“The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a Bayes-frequentist consolidated data analysis work ow that is more effective than either of the two separately.”

(I wonder who else in this Universe would characterise “modern Bayesian statistics” in such a non-Bayesian way! And love the notion of distillation applied to priors!) The setup is actually one of empirical Bayes inference where repeated values of the parameter θ drawn from the prior are behind independent observations. Which is not the usual framework for a statistical analysis, where a single value of the parameter is supposed to hide behind the data, but most convenient for frequency based arguments behind empirical Bayes methods (which is the case here). The paper adopts a far-from-modern discourse on the “truth” of “the” prior… (Which is always conjugate in that Universe!) Instead of recognising the relativity of a statistical analysis based on a given prior.

When I tried to read the paper any further, I hit a wall as I could not understand the principle described therein. And how it “consolidates Bayes and frequentist, parametric and nonparametric, subjective and objective, quantile and information-theoretic philosophies.”. Presumably the lack of oxygen at the altitude of Chamonix…. Given an “initial guess” at the prior, g, a conjugate prior (in dimension one with an invertible cdf), a family of priors is created in what first looks like a form of non-parametric exponential tilting of g. But a closer look [at (2.1)] exposes the “family” as the tautological π(θ)=g(θ)x π(θ)/g(θ). The ratio is expanded into a Legendre polynomial series. Which use in Bayesian statistics dates a wee bit further back than indicated in the paper (see, e.g., Friedman, 1985; Diaconis, 1986). With the side issue that the resulting approximation does not integrate to one. Another side issue is that the coefficients of the Legendre truncated series are approximated by simulations from the prior [Step 3 of the Type II algorithm], rarely an efficient approach to the posterior.

inferential models: reasoning with uncertainty [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 6, 2016 by xi'an

“the field of statistics (…) is still surprisingly underdeveloped (…) the subject lacks a solid theory for reasoning with uncertainty [and] there has been very little progress on the foundations of statistical inference” (p.xvi)

A book that starts with such massive assertions is certainly hoping to attract some degree of attention from the field and likely to induce strong reactions to this dismissal of the not inconsiderable amount of research dedicated so far to statistical inference and in particular to its foundations. Or even attarcting flak for not accounting (in this introduction) for the past work of major statisticians, like Fisher, Kiefer, Lindley, Cox, Berger, Efron, Fraser and many many others…. Judging from the references and the tone of this 254 pages book, it seems like the two authors, Ryan Martin and Chuanhai Liu, truly aim at single-handedly resetting the foundations of statistics to their own tune, which sounds like a new kind of fiducial inference augmented with calibrated belief functions. Be warned that five chapters of this book are built on as many papers written by the authors in the past three years. Which makes me question, if I may, the relevance of publishing a book on a brand-new approach to statistics without further backup from a wider community.

“…it is possible to calibrate our belief probabilities for a common interpretation by intelligent minds.” (p.14)

Chapter 1 contains a description of the new perspective in Section 1.4.2, which I find useful to detail here. When given an observation x from a Normal N(θ,1) model, the authors rewrite X as θ+Z, with Z~N(0,1), as in fiducial inference, and then want to find a “meaningful prediction of Z independently of X”. This seems difficult to accept given that, once X=x is observed, Z=X-θ⁰, θ⁰ being the true value of θ, which belies the independence assumption. The next step is to replace Z~N(0,1) by a random set S(Z) containing Z and to define a belief function bel() on the parameter space Θ by

bel(A|X) = P(X-S(Z)⊆A)

which induces a pseudo-measure on Θ derived from the distribution of an independent Z, since X is already observed. When Z~N(0,1), this distribution does not depend on θ⁰ the true value of θ… The next step is to choose the belief function towards a proper frequentist coverage, in the approximate sense that the probability that bel(A|X) be more than 1-α is less than α when the [arbitrary] parameter θ is not in A. And conversely. This property (satisfied when bel(A|X) is uniform) is called validity or exact inference by the authors: in my opinion, restricted frequentist calibration would certainly sound more adequate.

“When there is no prior information available, [the philosophical justifications for Bayesian analysis] are less than fully convincing.” (p.30)

“Is it logical that an improper “ignorance” prior turns into a proper “non-ignorance” prior when combined with some incomplete information on the whereabouts of θ?” (p.44)

Continue reading

Bayes’ Theorem in the 21st Century, really?!

Posted in Books, Statistics with tags , , , , , , on June 20, 2013 by xi'an

“In place of past experience, frequentism considers future behavior: an optimal estimator is one that performs best in hypothetical repetitions of the current experiment. The resulting gain in scientific objectivity has carried the day…”

Julien Cornebise sent me this Science column by Brad Efron about Bayes’ theorem. I am a tad surprised that it got published in the journal, given that it does not really contain any new item of information. However, being unfamiliar with Science, it may also be that it also publishes major scientists’ opinions or warnings, a label that can fit this column in Science. (It is quite a proper coincidence that the post appears during Bayes 250.)

Efron’s piece centres upon the use of objective Bayes approaches in Bayesian statistics, for which Laplace was “the prime violator”. He argues through examples that noninformative “Bayesian calculations cannot be uncritically accepted, and should be checked by other methods, which usually means “frequentistically”. First, having to write “frequentistically” once is already more than I can stand! Second, using the Bayesian framework to build frequentist procedures is like buying top technical outdoor gear to climb the stairs at the Sacré-Coeur on Butte Montmartre! The naïve reader is then left clueless as to why one should use a Bayesian approach in the first place. And perfectly confused about the meaning of objectivity. Esp. given the above quote! I find it rather surprising that this old saw of a  claim of frequentism to objectivity resurfaces there. There is an infinite range of frequentist procedures and, while some are more optimal than others, none is “the” optimal one (except for the most baked-out examples like say the estimation of the mean of a normal observation).

“A Bayesian FDA (there isn’t one) would be more forgiving. The Bayesian posterior probability of drug A’s superiority depends only on its final evaluation, not whether there might have been earlier decisions.”

The second criticism of Bayesianism therein is the counter-intuitive irrelevance of stopping rules. Once again, the presentation is fairly biased, because a Bayesian approach opposes scenarii rather than evaluates the likelihood of a tail event under the null and only the null. And also because, as shown by Jim Berger and co-authors, the Bayesian approach is generally much more favorable to the null than the p-value.

“Bayes’ Theorem is an algorithm for combining prior experience with current evidence. Followers of Nate Silver’s FiveThirtyEight column got to see it in spectacular form during the presidential campaign: the algorithm updated prior poll results with new data on a daily basis, nailing the actual vote in all 50 states.”

It is only fair that Nate Silver’s book and column are mentioned in Efron’s column. Because it is a highly valuable and definitely convincing illustration of Bayesian principles. What I object to is the criticism “that most cutting-edge science doesn’t enjoy FiveThirtyEight-level background information”. In my understanding, the poll model of FiveThirtyEight built up in a sequential manner a weight system over the different polling companies, hence learning from the data if in a Bayesian manner about their reliability (rather than forgetting the past). This is actually what caused Larry Wasserman to consider that Silver’s approach was actually more frequentist than Bayesian…

“Empirical Bayes is an exciting new statistical idea, well-suited to modern scientific technology, saying that experiments involving large numbers of parallel situations carry within them their own prior distribution.”

My last point of contention is about the (unsurprising) defence of the empirical Bayes approach in the Science column. Once again, the presentation is biased towards frequentism: in the FDR gene example, the empirical Bayes procedure is motivated by being the frequentist solution. The logical contradiction in “estimat[ing] the relevant prior from the data itself” is not discussed and the conclusion that Brad Efron uses “empirical Bayes methods in the parallel case [in the absence of prior information”, seemingly without being cautious and “uncritically”, does not strike me as the proper last argument in the matter! Nor does it give a 21st Century vision of what nouveau Bayesianism should be, faced with the challenges of Big Data and the like…

reading classics (#4)

Posted in Statistics, University life with tags , , , , , , , , , , on November 29, 2012 by xi'an

Another read today and not from JRSS B for once, namely,  Efron‘s (an)other look at the Jackknife, i.e. the 1979 bootstrap classic published in the Annals of Statistics. My Master students in the Reading Classics Seminar course thus listened today to Marco Brandi’s presentation, whose (Beamer) slides are here:

In my opinion this was an easier paper to discuss, more because of its visible impact than because of the paper itself, where the comparison with the jackknife procedure does not sound so relevant nowadays. again mostly algorithmic and requiring some background on how it impacted the field. Even though Marco also went through Don Rubin’s Bayesian bootstrap and Michael Jordan bag of little bootstraps, he struggled to get away from the technicality towards the intuition and the relevance of the method. The Bayesian bootstrap extension was quite interesting in that we discussed a lot the connections with Dirichlet priors and the lack of parameters that sounded quite antagonistic with the Bayesian principles. However, at the end of the day, I feel that this foundational paper was not explored in proportion to its depth and that it would be worth another visit.