*“The main purpose of this paper is to develop a new Bayesian hypothesis testing approach for the point null hypothesis testing (…) based on the Bayesian deviance and constructed in a decision theoretical framework. It can be regarded as the Bayesian version of the likelihood ratio test.”*

**T**his paper got published in *Journal of Econometrics* two years ago but I only read it a few days ago when Kerrie Mengersen pointed it out to me. Here is an interesting criticism of Bayes factors.

*“In the meantime, unfortunately, Bayes factors also suffers from several theoretical and practical difficulties. First, when improper prior distributions are used, **Bayes factors* contains undefined constants and takes arbitrary values (…) Second, when a proper but vague prior distribution with a large spread is used to represent prior ignorance, *Bayes factors* tends to favour the null hypothesis. The problem may persist even when *the sample size is large (…) Third, the calculation of **Bayes factors* generally requires the evaluation of marginal likelihoods. In many models, the marginal likelihoods may be difficult to compute.”

I completely agree with these points, which are part of a longer list in our testing by mixture estimation paper. The authors also rightly blame the rigidity of the 0-1 loss function behind the derivation of the Bayes factor. An alternative decision-theoretic based on the Kullback-Leibler distance has been proposed by José Bernardo and Raúl Rueda, in a 2002 paper, evaluating the average divergence between the null and the full under the full, with the slight drawback that any nuisance parameter has the same prior under both hypotheses. (Which makes me think of the Savage-Dickey paradox, since everything here seems to take place under the alternative.) And the larger drawback of requiring a lower bound for rejecting the null. (Although it could be calibrated under the null prior predictive.)

This paper suggests using instead the difference of the Bayesian deviances, which is the expected log ratio integrated against the posterior. (With the possible embarrassment of the quantity having no prior expectation since the ratio depends on the data. But after all the evidence or marginal likelihood faces the same “criticism”.) So it is a sort of Bayes factor on the logarithms, with a strong similarity with Bernardo & Rueda’s solution since they are equal in expectation under the marginal. As in Dawid et al.’s recent paper, the logarithm removes the issue with the normalising constant and with the Lindley-Jeffreys paradox. The approach then needs to be calibrated in order to define a decision bound about the null. The asymptotic distribution of the criterion is χ²(p)−p, where p is the dimension of the parameter to be tested, but this sounds like falling back on frequentist tests. And the deadly .05% bounds. I would rather favour a calibration of the criterion using prior or posterior predictives under both models…