Archive for false discovery rate

at last the type IX error

Posted in Statistics with tags , , , , , , , on May 11, 2020 by xi'an

a Bayesian interpretation of FDRs?

Posted in Statistics with tags , , , , , , , , , , on April 12, 2018 by xi'an

This week, I happened to re-read John Storey’ 2003 “The positive discovery rate: a Bayesian interpretation and the q-value”, because I wanted to check a connection with our testing by mixture [still in limbo] paper. I however failed to find what I was looking for because I could not find any Bayesian flavour in the paper apart from an FRD expressed as a “posterior probability” of the null, in the sense that the setting was one of opposing two simple hypotheses. When there is an unknown parameter common to the multiple hypotheses being tested, a prior distribution on the parameter makes these multiple hypotheses connected. What makes the connection puzzling is the assumption that the observed statistics defining the significance region are independent (Theorem 1). And it seems to depend on the choice of the significance region, which should be induced by the Bayesian modelling, not the opposite. (This alternative explanation does not help either, maybe because it is on baseball… Or maybe because the sentence “If a player’s [posterior mean] is above .3, it’s more likely than not that their true average is as well” does not seem to appear naturally from a Bayesian formulation.) [Disclaimer: I am not hinting at anything wrong or objectionable in Storey’s paper, just being puzzled by the Bayesian tag!]

It’s the selection’s fault not the p-values’… [seminar]

Posted in pictures, Statistics, University life with tags , , , , , , on February 5, 2016 by xi'an

Paris and la Seine, from Pont du Garigliano, Oct. 20, 2011Yoav Benjamini will give a seminar talk in Paris next Monday on the above (full title: “The replicability crisis in science: It’s the selection’s fault not the p-values’“). (That I will miss for being in Warwick at the time.) With a fairly terse abstract:

I shall discuss the problem of lack of replicability of results in science, and point at selective inference as a statistical root cause. I shall then present a few strategies for addressing selective inference, and their application in genomics, brain research and earlier phases of clinical trials where both primary and secondary endpoints are being used.

Details: February 8, 2016, 16h, Université Pierre & Marie Curie, campus Jussieu, salle 15-16-101.

O’Bayes 2015 [day #2]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , on June 4, 2015 by xi'an

vale1This morning was the most special time of the conference in that we celebrated Susie Bayarri‘s contributions and life together with members of her family. Jim gave a great introduction that went over Susie’s numerous papers and the impact they had in Statistics and outside Statistics. As well as her recognised (and unsurprising if you knew her) expertise in wine and food! The three talks in that morning were covering some of the domains within Susie’s fundamental contributions and delivered by former students of her: model assessment through various types of predictive p-values by Maria Eugenia Castellanos, Bayesian model selection by Anabel Forte, and computer models by Rui Paulo, all talks that translated quite accurately the extent of Susie’s contributions… In a very nice initiative, the organisers had also set a wine tasting break (at 10am!) around two vintages that Susie had reviewed in the past years [with reviews to show up soon in the Wines section of the ‘Og!]

The talks of the afternoon session were by Jean-Bernard (JB) Salomond about a new proposal to handle embedded hypotheses in a non-parametric framework and by James Scott about false discovery rates for neuroimaging. Despite the severe theoretical framework behind the proposal, JB managed a superb presentation that mostly focussed on the intuition for using the smoothed (or approximative) version of the null hypothesis. (A flavour of ABC, somehow?!) Also kudos to JB for perpetuating my tradition of starting sections with unrelated pictures. James’ topic was more practical Bayes or pragmatic Bayes than objective Bayes in that he analysed a large fMRI experiment on spatial working memory, introducing a spatial pattern that led to a complex penalised Lasso-like optimisation. The data was actually an fMRI of the brain of Russell Poldrack, one of James’ coauthors on that paper.

The (sole) poster session was on the evening with a diverse range of exciting topics—including three where I was a co-author, by Clara Grazian, Kaniav Kamary, and Kerrie Mengersen—but it was alas too short or I was alas too slow to complete the tour before it ended! In retrospect we could have broken it into two sessions since Wednesday evening is a free evening.

robust Bayesian FDR control with Bayes factors

Posted in Statistics, University life with tags , , , , on December 20, 2013 by xi'an

Here are a few comments on a recently arXived paper on FDRs by Xioaquan Wen (who asked for them!). Although there is less frenzy about false discovery rates in multiple testing now than in the 1990s, and I have not done anything on it since our 2004 JASA paper, this is still a topic of interest to me. Although maybe not in the formalised way the model is constructed here.

“Although the Bayesian FDR control is conceptually straightforward, its practical performance is susceptible to alternative model misspecifications. In comparison, the p-value based frequentist FDR control procedures demand only adequate behavior of p-values under the null models and generally ensure targeted FDR control levels, regardless of the distribution of p-values under the assumed alternatives.”

Now, I find the above quote of interest as it relates to Val Johnson’s argument for his uniformly most powerful “Bayesian” tests (now sufficiently discussed on the ‘Og!). This is a rather traditional criticism of using Bayes factors that they depend on the prior modelling, to the point it made it to the introduction of my tutorial yesterday. Actually, the paper has similar arguments to Johnson’s (who is quoted in the paper for earlier works) in that the criteria for validating a point estimator of the proportion of positives is highly frequentist. And does not care much about the alternative hypothesis. Besides, the modelling used therein is puzzling in that there seems to be a single parameter in the model, namely the true proportion of positives, which sounds to me as an hyper-stylised representation of real experiments. To the point of being useless… (Even if there are extra-parameters, they differ for each observation.) In addition, the argument leading to the proposed procedure is unclear: if the Bayes factors are to be consistent under the null and the proportion of positives needs an asymptotically guaranteed upper bound, the choice of a estimate equal to 1 does the job. (This is noticed on page 9.) So the presentation seems to miss a counter-factor to avoid this trivial solution.

“On the other hand, the Bayes factors from the true alternative models with reasonable powers should be, on average, greater than 1 (i.e., favoring the alternative over the null models). Therefore, the sample mean of the observed Bayes factors carries information regarding the mixture percentage.”

The estimator of this true proportion ends up being the proportion of Bayes factors less than 1, an anti-climactic proposal as it means accepting the null each time the Bayes factor is less than 1. (I did not check the proof that it overestimates the true proportion. ) Or the one of Storey (2003). However, the above quote shows it is validated only when the (true) alternative connects with the Bayes factor. So I do not see how this agrees with the robustness property of behaving well “under misspecifications of parametric alternative models”. Furthermore, in the specific framework adopted by the paper, the “misspecifications” are difficult to fathom, as they would mean that the parameter-free distributions of the observations under the alternatives are wrong and thus may render the Bayes factors to be arbitrary. Hence jeopardising the validity of the selection process. So there is something missing in the picture, I fear.

Thus, while the second half of the paper is dedicated to an extensive simulation study, what I found the most interesting direction in the paper is the question of the distribution of the Bayes factors (under the null or not), albeit not a Bayesian question, as it relates to the use and the calibration of ABC model choice (and the proposal by Fearnhead and Prangle of using the Bayes factor as the summary statistics). The fact that the (marginal) expectation of the Bayes factor under the null (marginal) is noteworthy but not as compelling as the author argues, because (a) it is only an expectation and (b) it tells nothing about the alternative. The distribution of the Bayes factor does depend upon the alternative through the Bayes factor, so mileage [of the quantile Bayes factor] may vary (as shown by the assumption “for Bayes factors with reasonable power”, p.14). Drawing Bayesian inference based on Bayes factors only is nonetheless an area worth investigating!

%d bloggers like this: