Archive for empirical Bayes methods

empirically Bayesian [wISBApedia]

Posted in Statistics with tags , , , , , , , on August 9, 2021 by xi'an

Last week I was pointed out a puzzling entry in the “empirical Bayes” Wikipedia page. The introduction section indeed contains a description of an iterative simulation method that involves an hyperprior p(η) even though the empirical Bayes perspective does not involve an hyperprior.

While the entry is vague and lacks formulae

These suggest an iterative scheme, qualitatively similar in structure to a Gibbs sampler, to evolve successively improved approximations to p(θy) and p(ηy). First, calculate an initial approximation to p(θy) ignoring the η dependence completely; then calculate an approximation to p(η | y) based upon the initial approximate distribution of p(θy); then use this p(ηy) to update the approximation for p(θy); then update p(ηy); and so on.

it sounds essentially equivalent to a Gibbs sampler, possibly a multiple try Gibbs sampler (unless the author had another notion in mind, alas impossible to guess since no reference is included).

Beyond this specific case, where I think the entire paragraph should be erased from the “empirical Bayes” Wikipedia page, I discussed the general problem of some poor Bayesian entries in Wikipedia with Robin Ryder, who came with the neat idea of running (collective) Wikipedia editing labs at ISBA conferences. If we could further give an ISBA label to these entries, as a certificate of “Bayesian orthodoxy” (!), it would be terrific!

bootstrap in Nature

Posted in Statistics with tags , , , , , , , , , , on December 29, 2018 by xi'an

A news item in the latest issue of Nature I received about Brad Efron winning the “Nobel Prize of Statistics” this year. The bootstrap is certainly an invention worth the recognition, not to mention Efron’s contribution to empirical Bayes analysis,, even though I remain overall reserved about the very notion of a Nobel prize in any field… With an appropriate XXL quote, who called the bootstrap method the ‘best statistical pain reliever ever produced’!

a Bayesian interpretation of FDRs?

Posted in Statistics with tags , , , , , , , , , , on April 12, 2018 by xi'an

This week, I happened to re-read John Storey’ 2003 “The positive discovery rate: a Bayesian interpretation and the q-value”, because I wanted to check a connection with our testing by mixture [still in limbo] paper. I however failed to find what I was looking for because I could not find any Bayesian flavour in the paper apart from an FRD expressed as a “posterior probability” of the null, in the sense that the setting was one of opposing two simple hypotheses. When there is an unknown parameter common to the multiple hypotheses being tested, a prior distribution on the parameter makes these multiple hypotheses connected. What makes the connection puzzling is the assumption that the observed statistics defining the significance region are independent (Theorem 1). And it seems to depend on the choice of the significance region, which should be induced by the Bayesian modelling, not the opposite. (This alternative explanation does not help either, maybe because it is on baseball… Or maybe because the sentence “If a player’s [posterior mean] is above .3, it’s more likely than not that their true average is as well” does not seem to appear naturally from a Bayesian formulation.) [Disclaimer: I am not hinting at anything wrong or objectionable in Storey’s paper, just being puzzled by the Bayesian tag!]

double yolk priors [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , on March 14, 2018 by xi'an

[Here is an email I received from Subhadeep Mukhopadhyay, one of the authors of the paper I discussed yesterday.}
Thank for discussing our work. Let me clarify the technical point that you raised:
– The difference between Legj(u)_j and Tj=Legj(G(θ)). One is orthonormal polyn of L2[0,1] and the other one is L2[G]. The second one is poly of rank-transform G(θ).
– As you correctly pointed out there is a danger in directly approximating the ratio. We work on it after taking the quantile transform: evaluate the ratio at g⁻¹(θ), which is the d(u;G,F) over unit interval. Now, this new transformed function is a proper density.
-Thus the ratio now becomes d(G(θ)) which can be expended into (NOT in Leg-basis) in T_j, in eq (2.2), as it lives in the Hilbert space L2(G)
– For your last point on Step 2 of our algo, we can also use the simple integrate command.
-Unlike traditional prior-data conflict here we attempted to answer three questions in one-shot: (i) How compatible is the pre-selected g with the given data? (ii) In the event of a conflict, can we also inform the user on the nature of misfit–finer structure that was a priori unanticipated? (iii) Finally, we would like to provide a simple, yet formal guideline for upgrading (repairing) the starting g.
Hopefully, this will clear the air. But thanks for reading the paper so carefully. Appreciate it.

double yolk priors

Posted in Statistics with tags , , , , on March 13, 2018 by xi'an

“To develop a “defendable and defensible” Bayesian learning model, we have to go beyond blindly ‘turning the crank’ based on a “go-as-you-like” [approximate guess] prior. A lackluster attitude towards prior modeling could lead to disastrous inference, impacting various fields from clinical drug development to presidential election forecasts. The real questions are: How can we uncover the blind spots of the conventional wisdom-based prior? How can we develop the science of prior model-building that combines both data and science [DS-prior] in a testable manner – a double-yolk Bayesian egg?”

I came through R bloggers on this presentation of a paper by Subhadeep Mukhopadhyay and Douglas Fletcher, Bayesian modelling via goodness of fit, that aims at solving all existing problems with classical Bayesian solutions, apparently! (With also apparently no awareness of David Spiegelhalter’s take on the matter.) As illustrated by both quotes, above and below:

“The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a Bayes-frequentist consolidated data analysis work ow that is more effective than either of the two separately.”

(I wonder who else in this Universe would characterise “modern Bayesian statistics” in such a non-Bayesian way! And love the notion of distillation applied to priors!) The setup is actually one of empirical Bayes inference where repeated values of the parameter θ drawn from the prior are behind independent observations. Which is not the usual framework for a statistical analysis, where a single value of the parameter is supposed to hide behind the data, but most convenient for frequency based arguments behind empirical Bayes methods (which is the case here). The paper adopts a far-from-modern discourse on the “truth” of “the” prior… (Which is always conjugate in that Universe!) Instead of recognising the relativity of a statistical analysis based on a given prior.

When I tried to read the paper any further, I hit a wall as I could not understand the principle described therein. And how it “consolidates Bayes and frequentist, parametric and nonparametric, subjective and objective, quantile and information-theoretic philosophies.”. Presumably the lack of oxygen at the altitude of Chamonix…. Given an “initial guess” at the prior, g, a conjugate prior (in dimension one with an invertible cdf), a family of priors is created in what first looks like a form of non-parametric exponential tilting of g. But a closer look [at (2.1)] exposes the “family” as the tautological π(θ)=g(θ)x π(θ)/g(θ). The ratio is expanded into a Legendre polynomial series. Which use in Bayesian statistics dates a wee bit further back than indicated in the paper (see, e.g., Friedman, 1985; Diaconis, 1986). With the side issue that the resulting approximation does not integrate to one. Another side issue is that the coefficients of the Legendre truncated series are approximated by simulations from the prior [Step 3 of the Type II algorithm], rarely an efficient approach to the posterior.

%d bloggers like this: