## Predictive Bayes factors?!

**W**e (as in we, the Cosmology/Statistics ANR 2005-2009 Ecosstat grant team) are currently working on a Bayesian testing paper with applications to cosmology and my colleagues showed me a paper by Roberto Trotta that I found most intriguing i its introduction of a predictive Bayes factor. A Bayes factor being a function of an observed or future dataset can indeed be predicted (for the latter) in a Bayesian fashion but I find difficult to make sense of the corresponding distribution from an inferential perspective. Here are a few points in the paper to which I object:

- The Bayes factor associated with should be based on as well if it is to work as a genuine Bayes factor. Otherwise, the information contained in is ignored;
- While a Bayes factor eliminates the influence of the prior probabilities of the null and of the alternative hypotheses, the predictive distribution of does not:

- The most natural use of the predictive distribution of would be to look at the mass above or below 1, thus to produce a sort of Bayesian predictive p-value, falling back into old tracks.
- If the current observation is not integrated in the future Bayes factor , it should be incorporated in the prior, the current posterior being then the future prior. In this case, the quantity of interest is not the predictive of but of

It may be that the disappearance of from the Bayes factor stems from a fear of “*using the data twice*“, which is a recurring argument in the criticisms of predictive Bayes inference. I have difficulties with the concept in general and, in the present case, there is no difficulty with using to predict the distribution of .

**I** also am puzzled by the MCMC strategy suggested in the paper in the case of embedded hypotheses. Trotta argues in §3.1 that it is sufficient to sample from the full model and to derive the Bayes factor by the Savage-Dickey representation, but this does not really agree with the approach of Chen, Shao and Ibrahim, while I think the identity (14) is missing an extra term, namely

which has the surprising feature of depending upon the value of the prior density at a specific value … (Details are in the reproduced pages of my notebook, above, that can be enlarged by clicking on “View Image” and then moving *“w=188&h=6**94&h=261″* to *“w=1188&h=694&h=1261”* in the page title.) Overall, I find most puzzling that simulating from a distribution over a set provides information about a distribution that is concentrated over a subset and that has *measure zero* against the initial measure. (I am actually suspicious of the Savage-Dickey representation itself, because it also uses the value of the prior and posterior densities at a given value , even though it has a very nice Gibbs interpretation/implementation…)

December 11, 2009 at 12:21 am

[…] The paper on evidence approximation by population Monte Carlo that I mentioned in a previous post is now arXived. (This is the second paper jointly produced by the members of the 2005-2009 ANR […]