Computing evidence

The book Random effects and latent variable model selection, edited by David Dunson in 2008 as a Springer Lecture Note. contains several chapters dealing with evidence approximation in mixed effect models. (Incidentally, I would be interested in the story behind the  Lecture Note as I found no explanation in the backcover or in the preface. Some chapters but not all refer to a SAMSI workshop on model uncertainty…) The final chapter written by Joyee Ghosh and David Dunson (similar to a corresponding paper in JCGS) contains in particular the interesting identity that the Bayes factor opposing model h to model h-1 can be unbiasedly approximated by (the average of the terms)

\dfrac{f(x|\theta_{i,h},\mathfrak{M}=h-1)}{f(x|\theta_{i,h},\mathfrak{M}=h)}

when

  • \mathfrak{M} is the model index,
  • the \theta_{i,h}‘s are simulated from the posterior under model h,
  • the model \mathfrak{M}=h-1 only considers the h-1 first components of \theta_{i,h},
  • the prior under model h-1 is the projection of the prior under model h. (Note that this marginalisation is not the projection used in Bayesian Core.)

I was first surprised when reading about this identity and wondered whether or not it was suffering either of the Savage-Dickey difficulty or of the harmonic mean instability, but this does not seem to be the case. The ratio is correctly defined thanks to the projection property and the ratio has no particular reason to suffer from infinite variance.

To check the practical performances of the method in another setting than the factor model studied in the paper, I tried it on a simple linear model with known variance

y=a+bx+\varepsilon

comparing the model including the (normal+log-trend) simulated regressor with the model without the regressor. I used a g-prior

\beta=(a,b)\sim\mathfrak{N}_2\left(0,n[\mathbf{X}^\text{T}\mathbf{X}]^{-1}\right)

on the full model and its (marginal)  projection

a\sim\mathfrak{N}_1\bigg(0,\sigma^2=\sum x_i^2/\sum [x_i-\bar x]^2\bigg)

which (rather counter-intuitively) involves the regressor values. This modelling should give a Bayes factor equal to

\dfrac{\varphi\left(\mathbf{y}|0,\mathbf{I}_n+\sigma^2\mathbf{1}\mathbf{1}^\text{T}\right)}{\varphi\left(\mathbf{y}|0,\mathbf{I}_n+n\mathbf{X}[\mathbf{X}^\text{T}\mathbf{X}]^{-1}\mathbf{X}^\text{T}\right)}

but the difference between the simulated Bayes factor and the actual value is quite large in my simulations, both under the full

> DD
[1] 1.769114e-10
> B01
[1] 0.004805112

and the restricted

> DD
[1] 0.6218985
> B01
[1] 2.516249

models. Here is the R code:


n=155
 x=.2*rnorm(n)+log(1:n)
 y=2+.8*x+rnorm(n)

#posterior parameter under G-prior with G=n
 pom={n/{n+1}}*lm(y~x)$coefficients
 pov={n/{n+1}}*summary(lm(y~x))$cov.unscaled

#sample from posterior
 library(mvtnorm)
 posim=rmvnorm(10^5,mean=pom,sigma=pov)

#Bayes F. approximation
 BF=dnorm(y[1],mean=posim[,1],log=TRUE)-
    dnorm(y[1]-posim[,1]-posim[,2]*x[1],log=TRUE)
 for (i in 2:n)
 BF=BF+dnorm(y[i],mean=posim[,1],log=TRUE)-
       dnorm(y[i]-posim[,1]-posim[,2]*x[i],log=TRUE)
 DD=mean(exp(BF))

#exact Bayes factor
 prig={n+1}*pov[1,1]
 B01=exp(dmvnorm(y,sigma=diag(n)+prig*matrix(1,n,n),log=TRUE)-
 dmvnorm(y,sigma=diag(n)+n*cbind(rep(1,n),x)%*%
     summary(lm(y~x))$cov.unscaled%*%rbind(rep(1,n),x),log=TRUE))

I think one explanation for the discrepancy is that the a‘s derived from the full model posterior are quite different from a‘s that would be generated from the restricted model posterior. The more I think about the above approximation, the less convinced I am that it can apply in full generality because of this drawback that the projected posterior has little in common with the posterior associated with the projected prior. (Since, formally, the same unbiasedness result applies when comparing a full model with k variables with a submodel with none but the constant, it is clear that the approximation may be poor in non-orthogonal situations.)

2 Responses to “Computing evidence”

  1. […] alternative, and the computational aspects of the resulting Bayesian procedures, including evidence, the Savage-Dickey paradox, nested sampling, harmonic mean estimators, and […]

  2. […] the continuation of my earlier post on computing evidence, I read a very interesting paper by Merlise Clyde, Joyee Ghosh and Michael Littman, to appear in […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: