**T**his week, I happened to re-read John Storey’ 2003 “The positive discovery rate: a Bayesian interpretation and the q-value”, because I wanted to check a connection with our testing by mixture [still in limbo] paper. I however failed to find what I was looking for because I could not find any Bayesian flavour in the paper apart from an FRD expressed as a “posterior probability” of the null, in the sense that the setting was one of opposing two simple hypotheses. When there is an unknown parameter common to the multiple hypotheses being tested, a prior distribution on the parameter makes these multiple hypotheses connected. What makes the connection puzzling is the assumption that the observed statistics defining the significance region are *independent* (Theorem 1). And it seems to depend on the choice of the significance region, which should be induced by the Bayesian modelling, not the opposite. (This alternative explanation does not help either, maybe because it is on baseball… Or maybe because the sentence “If a player’s [posterior mean] is above .3, it’s more likely than not that their true average is as well” does not seem to appear naturally from a Bayesian formulation.) *[Disclaimer: I am not hinting at anything wrong or objectionable in Storey’s paper, just being puzzled by the Bayesian tag!]*

## Archive for false discovery rate

## a Bayesian interpretation of FDRs?

Posted in Statistics with tags baseball data, empirical Bayes methods, false discovery rate, FDRs, ferry harbour, FNR, hypothesis testing, multiple tests, Seattle, shrinkage estimation, Washington State on April 12, 2018 by xi'an## It’s the selection’s fault not the p-values’… [seminar]

Posted in pictures, Statistics, University life with tags Benjamini, false discovery rate, Jussieu, p-values, replication crisis, seminar, Université Pierre et Marie Curie on February 5, 2016 by xi'anYoav Benjamini will give a seminar talk in Paris next Monday on the above (full title: “*The replicability crisis in science: It’s the selection’s fault not the p-values’*“). (That I will miss for being in Warwick at the time.) With a fairly terse abstract:

I shall discuss the problem of lack of replicability of results in science, and point at selective inference as a statistical root cause. I shall then present a few strategies for addressing selective inference, and their application in genomics, brain research and earlier phases of clinical trials where both primary and secondary endpoints are being used.

**Details:** February 8, 2016, 16h, Université Pierre & Marie Curie, campus Jussieu, salle 15-16-101.

## robust Bayesian FDR control with Bayes factors

Posted in Statistics, University life with tags Bayes factors, decision theory, false discovery rate, p-values, robustness on December 20, 2013 by xi'an**H**ere are a few comments on a recently arXived paper on FDRs by Xioaquan Wen (who asked for them!). Although there is less frenzy about false discovery rates in multiple testing now than in the 1990s, and I have not done anything on it since our 2004 JASA paper, this is still a topic of interest to me. Although maybe not in the formalised way the model is constructed here.

“Although the Bayesian FDR control is conceptually straightforward, its practical performance is susceptible to alternative model misspecifications. In comparison, thep-value based frequentist FDR control procedures demand only adequate behavior ofp-values under the null models and generally ensure targeted FDR control levels, regardless of the distribution ofp-values under the assumed alternatives.”

**N**ow, I find the above quote of interest as it relates to Val Johnson’s argument for his uniformly most powerful “Bayesian” tests (now sufficiently discussed on the ‘Og!). This is a rather traditional criticism of using Bayes factors that they depend on the prior modelling, to the point it made it to the introduction of my tutorial yesterday. Actually, the paper has similar arguments to Johnson’s (who is quoted in the paper for earlier works) in that the criteria for validating a point estimator of the proportion of positives is highly frequentist. And does not care much about the alternative hypothesis. Besides, the modelling used therein is puzzling in that there seems to be a single parameter in the model, namely the true proportion of positives, which sounds to me as an hyper-stylised representation of real experiments. To the point of being useless… (Even if there are extra-parameters, they differ for each observation.) In addition, the argument leading to the proposed procedure is unclear: if the Bayes factors are to be consistent under the null and the proportion of positives needs an asymptotically guaranteed upper bound, the choice of a estimate equal to 1 does the job. (This is noticed on page 9.) So the presentation seems to miss a counter-factor to avoid this trivial solution.

“On the other hand, the Bayes factors from the true alternative models with reasonable powers should be, on average, greater than 1 (i.e., favoring the alternative over the null models). Therefore, the sample mean of the observed Bayes factors carries information regarding the mixture percentage.”

**T**he estimator of this true proportion ends up being the proportion of Bayes factors less than 1, an anti-climactic proposal as it means accepting the null each time the Bayes factor is less than 1. (I did not check the proof that it overestimates the true proportion. ) Or the one of Storey (2003). However, the above quote shows it is validated only when the (true) alternative connects with the Bayes factor. So I do not see how this agrees with the robustness property of behaving well “under misspecifications of parametric alternative models”. Furthermore, in the specific framework adopted by the paper, the “misspecifications” are difficult to fathom, as they would mean that the parameter-free distributions of the observations under the alternatives are wrong and thus may render the Bayes factors to be arbitrary. Hence jeopardising the validity of the selection process. So there is something missing in the picture, I fear.

**T**hus, while the second half of the paper is dedicated to an extensive simulation study, what I found the most interesting direction in the paper is the question of the distribution of the Bayes factors (under the null or not), albeit not a Bayesian question, as it relates to the use and the calibration of ABC model choice (and the proposal by Fearnhead and Prangle of using the Bayes factor as *the* summary statistics). The fact that the (marginal) expectation of the Bayes factor under the null (marginal) is noteworthy but not as compelling as the author argues, because (a) it is only an expectation and (b) it tells nothing about the alternative. The distribution of the Bayes factor does depend upon the alternative through the Bayes factor, so mileage [of the quantile Bayes factor] may vary (as shown by the assumption “for Bayes factors with reasonable power”, p.14). Drawing Bayesian inference based on Bayes factors only is nonetheless an area worth investigating!

## Large-scale Inference

Posted in Books, R, Statistics, University life with tags Bayes factor, Bayesian inference, Bayesian model choice, Bayesian tests, bootstrap, Brad Efron, empirical Bayes methods, false discovery rate, FDR, large data problems, microarrays, missing species problem, Read paper, RSS, Significance, Stanford University on February 24, 2012 by xi'an**L***arge-scale Inference* by Brad Efron is the first IMS Monograph in this new series, coordinated by David Cox and published by Cambridge University Press. Since I read this book immediately after Cox’ and Donnelly’s *Principles of Applied Statistics*, I was thinking of drawing a parallel between the two books. However, while none of them can be classified as textbooks [even though Efron’s has exercises], they differ very much in their intended audience and their purpose. As I wrote in the review of *Principles of Applied Statistics*, the book has an encompassing scope with the goal of covering all the methodological steps required by a statistical study. In *Large-scale Inference*, Efron focus on empirical Bayes methodology for large-scale inference, by which he mostly means multiple testing (rather than, say, data mining). As a result, the book is centred on mathematical statistics and is more technical. *(Which does not mean it less of an exciting read!)* The book was recently reviewed by Jordi Prats for Significance. Akin to the previous reviewer, and unsurprisingly, I found the book nicely written, with a wealth of R (colour!) graphs (the R programs and dataset are available on Brad Efron’s home page).

“

I have perhaps abused the “mono” in monograph by featuring methods from my own work of the past decade.” (p.xi)

**S**adly, I cannot remember if I read my first Efron’s paper via his 1977 introduction to the Stein phenomenon with Carl Morris in *Pour la Science* (the French translation of Scientific American) or through his 1983 *Pour la Science* paper with Persi Diaconis on computer intensive methods. *(I would bet on the later though.)* In any case, I certainly read a lot of the Efron’s papers on the Stein phenomenon during my thesis and it was thus with great pleasure that I saw he introduced empirical Bayes notions through the Stein phenomenon (Chapter 1). It actually took me a while but I eventually (by page 90) realised that *empirical Bayes* was a proper subtitle to *Large-Scale Inference* in that the large samples were giving some weight to the validation of empirical Bayes analyses. In the sense of reducing the importance of a genuine Bayesian modelling (even though I do not see why this genuine Bayesian modelling could not be implemented in the cases covered in the book).

“

Large N isn’t infinity and empirical Bayes isn’t Bayes.” (p.90)

**T**he core of* **Large-scale Inference* is multiple testing and the empirical Bayes justification/construction of Fdr’s (false discovery rates). Efron wrote more than a dozen papers on this topic, covered in the book and building on the groundbreaking and highly cited Series B 1995 paper by Benjamini and Hochberg. (In retrospect, it should have been a Read Paper and so was made a “retrospective read paper” by the Research Section of the RSS.) Frd are essentially posterior probabilities and therefore open to empirical Bayes approximations when priors are not selected. Before reaching the concept of Fdr’s in Chapter 4, Efron goes over earlier procedures for removing multiple testing biases. As shown by a section title (“Is FDR Control “Hypothesis Testing”?”, p.58), one major point in the book is that an Fdr is more of an estimation procedure than a significance-testing object. (This is not a surprise from a Bayesian perspective since the posterior probability is an estimate as well.)

“

Scientific applications of single-test theory most often suppose, or hope forrejectionof the null hypothesis (…) Large-scale studies are usually carried out with the expectation that most of the N cases willacceptthe null hypothesis.” (p.89)

**O**n the innovations proposed by Efron and described in *Large-scale Inference*, I particularly enjoyed the notions of local Fdrs in Chapter 5 (essentially pluggin posterior probabilities that a given observation stems from the null component of the mixture) and of the (Bayesian) improvement brought by empirical null estimation in Chapter 6 (“not something one estimates in classical hypothesis testing”, p.97) and the explanation for the inaccuracy of the bootstrap (which “stems from a simpler cause”, p.139), but found less crystal-clear the empirical evaluation of the accuracy of Fdr estimates (Chapter 7, ‘independence is only a dream”, p.113), maybe in relation with my early career inability to explain Morris’s (1983) correction for empirical Bayes confidence intervals (pp. 12-13). I also discovered the notion of *enrichment* in Chapter 9, with permutation tests resembling some low-key bootstrap, and *multiclass models* in Chapter 10, which appear as if they could benefit from a hierarchical Bayes perspective. The last chapter happily concludes with one of my preferred stories, namely the missing species problem (on which I hope to work this very Spring).