robust Bayesian FDR control with Bayes factors
Here are a few comments on a recently arXived paper on FDRs by Xioaquan Wen (who asked for them!). Although there is less frenzy about false discovery rates in multiple testing now than in the 1990s, and I have not done anything on it since our 2004 JASA paper, this is still a topic of interest to me. Although maybe not in the formalised way the model is constructed here.
“Although the Bayesian FDR control is conceptually straightforward, its practical performance is susceptible to alternative model misspecifications. In comparison, the p-value based frequentist FDR control procedures demand only adequate behavior of p-values under the null models and generally ensure targeted FDR control levels, regardless of the distribution of p-values under the assumed alternatives.”
Now, I find the above quote of interest as it relates to Val Johnson’s argument for his uniformly most powerful “Bayesian” tests (now sufficiently discussed on the ‘Og!). This is a rather traditional criticism of using Bayes factors that they depend on the prior modelling, to the point it made it to the introduction of my tutorial yesterday. Actually, the paper has similar arguments to Johnson’s (who is quoted in the paper for earlier works) in that the criteria for validating a point estimator of the proportion of positives is highly frequentist. And does not care much about the alternative hypothesis. Besides, the modelling used therein is puzzling in that there seems to be a single parameter in the model, namely the true proportion of positives, which sounds to me as an hyper-stylised representation of real experiments. To the point of being useless… (Even if there are extra-parameters, they differ for each observation.) In addition, the argument leading to the proposed procedure is unclear: if the Bayes factors are to be consistent under the null and the proportion of positives needs an asymptotically guaranteed upper bound, the choice of a estimate equal to 1 does the job. (This is noticed on page 9.) So the presentation seems to miss a counter-factor to avoid this trivial solution.
“On the other hand, the Bayes factors from the true alternative models with reasonable powers should be, on average, greater than 1 (i.e., favoring the alternative over the null models). Therefore, the sample mean of the observed Bayes factors carries information regarding the mixture percentage.”
The estimator of this true proportion ends up being the proportion of Bayes factors less than 1, an anti-climactic proposal as it means accepting the null each time the Bayes factor is less than 1. (I did not check the proof that it overestimates the true proportion. ) Or the one of Storey (2003). However, the above quote shows it is validated only when the (true) alternative connects with the Bayes factor. So I do not see how this agrees with the robustness property of behaving well “under misspecifications of parametric alternative models”. Furthermore, in the specific framework adopted by the paper, the “misspecifications” are difficult to fathom, as they would mean that the parameter-free distributions of the observations under the alternatives are wrong and thus may render the Bayes factors to be arbitrary. Hence jeopardising the validity of the selection process. So there is something missing in the picture, I fear.
Thus, while the second half of the paper is dedicated to an extensive simulation study, what I found the most interesting direction in the paper is the question of the distribution of the Bayes factors (under the null or not), albeit not a Bayesian question, as it relates to the use and the calibration of ABC model choice (and the proposal by Fearnhead and Prangle of using the Bayes factor as the summary statistics). The fact that the (marginal) expectation of the Bayes factor under the null (marginal) is noteworthy but not as compelling as the author argues, because (a) it is only an expectation and (b) it tells nothing about the alternative. The distribution of the Bayes factor does depend upon the alternative through the Bayes factor, so mileage [of the quantile Bayes factor] may vary (as shown by the assumption “for Bayes factors with reasonable power”, p.14). Drawing Bayesian inference based on Bayes factors only is nonetheless an area worth investigating!
April 5, 2016 at 3:40 pm
“The estimator of this true proportion ends up being the proportion of Bayes factors less than 1”
This isn’t the estimator used. The estimate involves identifying the largest set of of Bayes factors (BFs) such that their sample mean is less than 1. This means that many factors greater than 1 may be included in this proportion.
I think the justification is as follows: Imagine that you have a large collection is tests where you know the null is true. Then compute the BF for each. The sample mean should be less than 1. With a small collection of null tests, it is possible that the sample mean will sneak above 1, but with a large collection is should be very very unlikely.
We now consider the tests where the null is not true. Hopefully, their expected BF will be *much* greater than 1, and the observed BFs also, and therefore not become part of the set discussed above.
Under very well specified alternatives, where the alternative is very different from the null and where the BF test is very powerful, then the set discussed above should give a good estimate of the proportion of null tests.
But the alternatives may be misspecified. This means their BFs (expected and observed) might be smaller than 1. (Basically, we can’t really say anything about the BF’s distribution when the null is false.) The good news is that this should, at worst, lead to an overestimate of the proportion via increasing (not decreasing) the size of the set used in the estimator.
December 20, 2013 at 3:19 am
Dear Xi’an,
Thanks very much for your comments, I did ask for them :-) and I sincerely appreciate your thoughts! This is mainly because I believe that what we have engaged here should be the future of academic publishing, i.e., open access (via arXiv) and open discussion (via blog or social media).
I also think I owe you (and your readers) a detailed response which I will probably write in the next a few days. For now, I just want to clarify that the setting I considered in the paper is simultaneous testing of a large number of hypothese where applying Bayesian asymptotics becomes appropriate. This should distinguish our approach with what Johnson considered.