Predictive Bayes factors?!

page53, bloc5page54, bloc5

We (as in we, the Cosmology/Statistics ANR 2005-2009 Ecosstat grant team) are currently working on a Bayesian testing paper with applications to cosmology and my colleagues showed me a paper by Roberto Trotta that I found most intriguing i its introduction of a predictive Bayes factor. A Bayes factor being a function of an observed x or future x^\prime dataset can indeed be predicted (for the latter) in a Bayesian fashion but I find difficult to make sense of the corresponding distribution from an inferential perspective. Here are a few points in the paper to which I object:

  • The Bayes factor associated with x^\prime should be based on x as well if it is to work as a genuine Bayes factor. Otherwise, the information contained in x is ignored;
  • While a Bayes factor eliminates the influence of the prior probabilities of the null and of the alternative hypotheses, the predictive distribution of x^\prime does not:

x^\prime | x \sim p(H_0) m_0(x,x^\prime) + p(H_a) m_a(x,x^\prime)

  • The most natural use of the predictive distribution of B(x,x^\prime) would be to look at the mass above or below 1, thus to produce a sort of Bayesian predictive p-value, falling back into old tracks.
  • If the current observation x is not integrated in the future Bayes factor B(x^\prime), it should be incorporated in the prior, the current posterior being then the future prior. In this case, the quantity of interest is not the predictive of B(x^\prime) but of

B(x,x^\prime) / B(x).

It may be that the disappearance of x from the Bayes factor stems from a fear of “using the data twice“, which is a recurring argument in the criticisms of predictive Bayes inference. I have difficulties with the concept in general and, in the present case, there is no difficulty with using \pi(x^\prime| x) to predict the distribution of B(x,x^\prime).

I also am puzzled by the MCMC strategy suggested in the paper in the case of embedded hypotheses. Trotta argues in §3.1 that it is sufficient to sample from the full model and to derive the Bayes factor by the Savage-Dickey representation, but this does not really agree with the approach of Chen, Shao and Ibrahim, while I think the identity (14) is missing an extra term, namely

\dfrac{p(d|M_0)p(\omega_\star|M_1)}{p(d|M_1)},

which has the surprising feature of depending upon the value of the prior density at a specific value \omega_\star… (Details are in the reproduced pages of my notebook, above, that can be enlarged by clicking on “View Image” and then moving “w=188&h=694&h=261″ to “w=1188&h=694&h=1261” in the page title.) Overall, I find most puzzling that simulating from a distribution over a set \Theta provides information about a distribution that is concentrated over a subset \Theta_0 and that has measure zero against the initial measure. (I am actually suspicious of the Savage-Dickey representation itself, because it also uses the value of the prior and posterior densities at a given value \omega_\star, even though it has a very nice Gibbs interpretation/implementation…)

One Response to “Predictive Bayes factors?!”

  1. […] The paper on evidence approximation by population Monte Carlo that I mentioned in a previous post is now arXived. (This is the second paper jointly produced by the members of the 2005-2009 ANR […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.