Today I am teaching my yearly class at Warwick as a short introduction to computational techniques for Bayes factors approximation for MASDOC and PhD students in the Statistical Frontiers seminar, gathering several talks from the past years. Here are my slides:
Archive for Bayes factors
(Following my earlier discussion of his paper, Xiaoquan Wen sent me this detailed reply.)
I think it is appropriate to start my response to your comments by introducing a little bit of the background information on my research interest and the project itself: I consider myself as an applied statistician, not a theorist, and I am interested in developing theoretically sound and computationally efficient methods to solve practical problems. The FDR project originated from a practical application in genomics involving hypothesis testing. The details of this particular application can be found in this published paper, and the simulations in the manuscript are also designed for a similar context. In this application, the null model is trivially defined, however there exist finitely many alternative scenarios for each test. We proposed a Bayesian solution that handles this complex setting quite nicely: in brief, we chose to model each possible alternative scenario parametrically, and by taking advantage of Bayesian model averaging, Bayes factor naturally ended up as our test statistic. We had no problem in demonstrating the resulting Bayes factor is much more powerful than the existing approaches, even accounting for the prior (mis-)modeling for Bayes factors. However, in this genomics application, there are potentially tens of thousands of tests need to be simultaneously performed, and FDR control becomes necessary and challenging. Continue reading
In yet another permutation of the original title (!), Andrew Gelman posted the answer Val Johnson sent him after our (submitted) letter to PNAS. As Val did not send me a copy (although Andrew did!), I will not reproduce it here and I rather refer the interested readers to Andrews’ blog… In addition to Andrew’s (sensible) points, here are a few idle (post-X’mas and pre-skiing) reflections:
- “evidence against a false null hypothesis accrues exponentially fast” makes me wonder in which metric this exponential rate (in γ?) occurs;
- that “most decision-theoretic analyses of the optimal threshold to use for declaring a significant finding would lead to evidence thresholds that are substantially greater than 5 (and probably also greater 25)” is difficult to accept as an argument since there is no trace of a decision-theoretic argument in the whole paper;
- Val rejects our minimaxity argument on the basis that “[UMPBTs] do not involve minimization of maximum loss” but the prior that corresponds to those tests is minimising the integrated probability of not rejecting at threshold level γ, a loss function integrated against parameter and observation, a Bayes risk in other words… Point masses or spike priors are clearly characteristics of minimax priors. Furthermore, the additional argument that “in most applications, however, a unique loss function/prior distribution combination does not exist” has been used by many to refute the Bayesian perspective and makes me wonder what are the arguments left in using a (pseudo-)Bayesian approach;
- the next paragraph is pure tautology: the fact that “no other test, based on either a subjectively or objectively specified alternative hypothesis, is as likely to produce a Bayes factor that exceeds the specified evidence threshold” is a paraphrase of the definition of UMPBTs, not an argument. I do not see we should solely “worry about false negatives”, since minimising those should lead to a point mass on the null (or, more seriously, should not lead to the minimax-like selection of the prior under the alternative).
Here are a few comments on a recently arXived paper on FDRs by Xioaquan Wen (who asked for them!). Although there is less frenzy about false discovery rates in multiple testing now than in the 1990s, and I have not done anything on it since our 2004 JASA paper, this is still a topic of interest to me. Although maybe not in the formalised way the model is constructed here.
“Although the Bayesian FDR control is conceptually straightforward, its practical performance is susceptible to alternative model misspecifications. In comparison, the p-value based frequentist FDR control procedures demand only adequate behavior of p-values under the null models and generally ensure targeted FDR control levels, regardless of the distribution of p-values under the assumed alternatives.”
Now, I find the above quote of interest as it relates to Val Johnson’s argument for his uniformly most powerful “Bayesian” tests (now sufficiently discussed on the ‘Og!). This is a rather traditional criticism of using Bayes factors that they depend on the prior modelling, to the point it made it to the introduction of my tutorial yesterday. Actually, the paper has similar arguments to Johnson’s (who is quoted in the paper for earlier works) in that the criteria for validating a point estimator of the proportion of positives is highly frequentist. And does not care much about the alternative hypothesis. Besides, the modelling used therein is puzzling in that there seems to be a single parameter in the model, namely the true proportion of positives, which sounds to me as an hyper-stylised representation of real experiments. To the point of being useless… (Even if there are extra-parameters, they differ for each observation.) In addition, the argument leading to the proposed procedure is unclear: if the Bayes factors are to be consistent under the null and the proportion of positives needs an asymptotically guaranteed upper bound, the choice of a estimate equal to 1 does the job. (This is noticed on page 9.) So the presentation seems to miss a counter-factor to avoid this trivial solution.
“On the other hand, the Bayes factors from the true alternative models with reasonable powers should be, on average, greater than 1 (i.e., favoring the alternative over the null models). Therefore, the sample mean of the observed Bayes factors carries information regarding the mixture percentage.”
The estimator of this true proportion ends up being the proportion of Bayes factors less than 1, an anti-climactic proposal as it means accepting the null each time the Bayes factor is less than 1. (I did not check the proof that it overestimates the true proportion. ) Or the one of Storey (2003). However, the above quote shows it is validated only when the (true) alternative connects with the Bayes factor. So I do not see how this agrees with the robustness property of behaving well “under misspecifications of parametric alternative models”. Furthermore, in the specific framework adopted by the paper, the “misspecifications” are difficult to fathom, as they would mean that the parameter-free distributions of the observations under the alternatives are wrong and thus may render the Bayes factors to be arbitrary. Hence jeopardising the validity of the selection process. So there is something missing in the picture, I fear.
Thus, while the second half of the paper is dedicated to an extensive simulation study, what I found the most interesting direction in the paper is the question of the distribution of the Bayes factors (under the null or not), albeit not a Bayesian question, as it relates to the use and the calibration of ABC model choice (and the proposal by Fearnhead and Prangle of using the Bayes factor as the summary statistics). The fact that the (marginal) expectation of the Bayes factor under the null (marginal) is noteworthy but not as compelling as the author argues, because (a) it is only an expectation and (b) it tells nothing about the alternative. The distribution of the Bayes factor does depend upon the alternative through the Bayes factor, so mileage [of the quantile Bayes factor] may vary (as shown by the assumption “for Bayes factors with reasonable power”, p.14). Drawing Bayesian inference based on Bayes factors only is nonetheless an area worth investigating!
“The difficulty in constructing a Bayesian hypothesis test arises from the requirement to specify an alternative hypothesis.”
Vale Johnson published (and arXived) a paper in the Annals of Statistics on uniformly most powerful Bayesian tests. This is in line with earlier writings of Vale on the topic and good quality mathematical statistics, but I cannot really buy the arguments contained in the paper as being compatible with (my view of) Bayesian tests. A “uniformly most powerful Bayesian test” (acronymed as UMBT) is defined as
“UMPBTs provide a new form of default, nonsubjective Bayesian tests in which the alternative hypothesis is determined so as to maximize the probability that a Bayes factor exceeds a specified threshold”
which means selecting the prior under the alternative so that the frequentist probability of the Bayes factor exceeding the threshold is maximal for all values of the parameter. This does not sound very Bayesian to me indeed, due to this averaging over all possible values of the observations x and comparing the probabilities for all values of the parameter θ rather than integrating against a prior or posterior and selecting the prior under the alternative with the sole purpose of favouring the alternative, meaning its further use when the null is rejected is not considered at all and catering to non-Bayesian theories, i.e. trying to sell Bayesian tools as supplementing p-values and arguing the method is objective because the solution satisfies a frequentist coverage (at best, this maximisation of the rejection probability reminds me of minimaxity, except there is no clear and generic notion of minimaxity in hypothesis testing).