Computing evidence
The book Random effects and latent variable model selection, edited by David Dunson in 2008 as a Springer Lecture Note. contains several chapters dealing with evidence approximation in mixed effect models. (Incidentally, I would be interested in the story behind the Lecture Note as I found no explanation in the backcover or in the preface. Some chapters but not all refer to a SAMSI workshop on model uncertainty…) The final chapter written by Joyee Ghosh and David Dunson (similar to a corresponding paper in JCGS) contains in particular the interesting identity that the Bayes factor opposing model h to model h-1 can be unbiasedly approximated by (the average of the terms)
when
is the model index,
- the
‘s are simulated from the posterior under model h,
- the model
only considers the h-1 first components of
,
- the prior under model h-1 is the projection of the prior under model h. (Note that this marginalisation is not the projection used in Bayesian Core.)
I was first surprised when reading about this identity and wondered whether or not it was suffering either of the Savage-Dickey difficulty or of the harmonic mean instability, but this does not seem to be the case. The ratio is correctly defined thanks to the projection property and the ratio has no particular reason to suffer from infinite variance.
To check the practical performances of the method in another setting than the factor model studied in the paper, I tried it on a simple linear model with known variance
comparing the model including the (normal+log-trend) simulated regressor with the model without the regressor. I used a g-prior
on the full model and its (marginal) projection
which (rather counter-intuitively) involves the regressor values. This modelling should give a Bayes factor equal to
but the difference between the simulated Bayes factor and the actual value is quite large in my simulations, both under the full
> DD
[1] 1.769114e-10
> B01
[1] 0.004805112
and the restricted
> DD
[1] 0.6218985
> B01
[1] 2.516249
models. Here is the R code:
n=155 x=.2*rnorm(n)+log(1:n) y=2+.8*x+rnorm(n) #posterior parameter under G-prior with G=n pom={n/{n+1}}*lm(y~x)$coefficients pov={n/{n+1}}*summary(lm(y~x))$cov.unscaled #sample from posterior library(mvtnorm) posim=rmvnorm(10^5,mean=pom,sigma=pov) #Bayes F. approximation BF=dnorm(y[1],mean=posim[,1],log=TRUE)- dnorm(y[1]-posim[,1]-posim[,2]*x[1],log=TRUE) for (i in 2:n) BF=BF+dnorm(y[i],mean=posim[,1],log=TRUE)- dnorm(y[i]-posim[,1]-posim[,2]*x[i],log=TRUE) DD=mean(exp(BF)) #exact Bayes factor prig={n+1}*pov[1,1] B01=exp(dmvnorm(y,sigma=diag(n)+prig*matrix(1,n,n),log=TRUE)- dmvnorm(y,sigma=diag(n)+n*cbind(rep(1,n),x)%*% summary(lm(y~x))$cov.unscaled%*%rbind(rep(1,n),x),log=TRUE))
I think one explanation for the discrepancy is that the a‘s derived from the full model posterior are quite different from a‘s that would be generated from the restricted model posterior. The more I think about the above approximation, the less convinced I am that it can apply in full generality because of this drawback that the projected posterior has little in common with the posterior associated with the projected prior. (Since, formally, the same unbiasedness result applies when comparing a full model with k variables with a submodel with none but the constant, it is clear that the approximation may be poor in non-orthogonal situations.)
December 8, 2010 at 7:37 am
[…] alternative, and the computational aspects of the resulting Bayesian procedures, including evidence, the Savage-Dickey paradox, nested sampling, harmonic mean estimators, and […]
December 6, 2010 at 12:11 am
[…] the continuation of my earlier post on computing evidence, I read a very interesting paper by Merlise Clyde, Joyee Ghosh and Michael Littman, to appear in […]