## Bayesian computational tools

Posted in Statistics with tags , , , , , , on April 10, 2013 by xi'an

I just arXived a survey entitled Bayesian computational tools in connection with a chapter the editors of the Annual Review of Statistics and Its Application asked me to write. (A puzzling title, I would have used Applications, not Application. Puzzling journal too: endowed with a prestigious editorial board, I wonder at the long-term perspectives of the review, once “all” topics have been addressed. At least, the “non-profit” aspect is respected: $100 for personal subscriptions and$250 for libraries, plus a one-year complimentary online access to volume 1.) Nothing terribly novel in my review, which illustrates some computational tool in some Bayesian settings, missing five or six pages to cover particle filters and sequential Monte Carlo. I however had fun with a double-exponential (or Laplace) example. This distribution indeed allows for a closed-form posterior distribution on the location parameter under a normal prior, which can be expressed as a mixture of truncated normal distributions. A mixture of (n+1) normal distributions for a sample of size n. We actually noticed this fact (which may already be well-known) when looking at our leading example in the consistent ABC choice paper, but it vanished from the appendix in the later versions. As detailed in the previous post, I also fought programming issues induced by this mixture, due to round-up errors in the most extreme components, until all approaches provided similar answers.

## Selecting statistics for ABC model choice [R code]

Posted in R, Statistics, University life with tags , , , , on November 2, 2011 by xi'an

As supplementary material to the ABC paper we just arXived, here is the R code I used to produce the Bayes factor comparisons between summary statistics in the normal versus Laplace example. (Warning: running the R code takes a while!)

# ABC model comparison between Laplace and normal
nobs=10^4
nsims=100
Niter=10^5
sqrtwo=sqrt(2)

probA=probB=matrix(0,nsims,3)
dista=distb=rep(0,Niter)
pro=c(.001,.01,.1)

#A) Simulation from the normal model
for (sims in 1:nsims){

tru=rnorm(nobs)
#stat=c(mean(tru),median(tru),var(tru))
#stat=c(mean(tru^4),mean(tru^6))
mu=rnorm(Niter,sd=2)

for (t in 1:Niter){

#a) normal predictive
prop=rnorm(nobs,mean=mu[t])
#pstat=c(mean(prop),median(prop),var(prop))
#pstat=c(mean(prop^4),mean(prop^6))
dista[t]=sum((pstat-stat)^2)

#b) Laplace predictive
prop=mu[t]+sample(c(-1,1),nobs,rep=TRUE)*rexp(nobs,rate=sqrtwo)
#pstat=c(mean(prop),median(prop),var(prop))
#pstat=c(mean(prop^4),mean(prop^6))
distb[t]=sum((pstat-stat)^2)
}

epsi=quantile(c(dista,distb),prob=pro)
for (i in 1:3)
probA[sims,i]=sum(dista<epsi[i])/(2*Niter*pro[i])
}
#B) Simulation from the Laplace model
for (sims in 1:nsims){

tru=sample(c(-1,1),nobs,rep=TRUE)*rexp(nobs,rate=sqrtwo)
#stat=c(mean(tru),median(tru),var(tru))
mu=rnorm(Niter,sd=2)

for (t in 1:Niter){

#a) normal predictive
prop=rnorm(nobs,mean=mu[t])
#pstat=c(mean(prop),median(prop),var(prop))
#pstat=c(mean(prop^4),mean(prop^6))
dista[t]=sum((pstat-stat)^2)

#b) Laplace predictive
prop=mu[t]+sample(c(-1,1),nobs,rep=TRUE)*rexp(nobs,rate=sqrtwo)
#pstat=c(mean(prop),median(prop),var(prop))
#pstat=c(mean(prop^4),mean(prop^6))
distb[t]=sum((pstat-stat)^2)
}

epsi=quantile(c(dista,distb),prob=pro)
for (i in 1:3)
probB[sims,i]=sum(dista<epsi[i])/(2*Niter*pro[i])
}


## Selecting statistics for [ABC] Bayesian model choice

Posted in Statistics, University life with tags , , , , , , , , , on October 25, 2011 by xi'an

At last, we have completed, arXived, and submitted our paper on the evaluation of summary statistics for Bayesian model choice! (I had presented preliminary versions at the recent workshops in New York and Zürich.) While broader in scope, the results obtained by Judith Rousseau, Jean-Michel Marin, Natesh Pillai, and myself bring an answer to the question raised by our PNAS paper on ABC model choice. Almost as soon as we realised the problem, that is, during MCMC’Ski in Utah, I talked with Judith about a possible classification of statistics in terms of their Bayes factor performances and we started working on that… While the idea of separating the mean behaviour of the statistics under both model came rather early, establishing a complete theoretical framework that validated this intuition took quite a while and the assumptions changed a few times around the summer. The simulations associated with the paper were straightforward in that (a) the setup had been suggested to us by a referee of our PNAS paper: compare normal and Laplace distributions with different summary statistics (inc. the median absolute deviation), (b) the theoretical results told us what to look for, and (c) they did very clearly exhibit the consistency and inconsistency of the Bayes factor/posterior probability predicted by the theory. Both boxplots shown here exhibit this agreement: when using (empirical) mean, median, and variance to compare normal and Laplace models, the posterior probabilities do not select the “true” model but instead aggregate near a fixed value. When using instead the median absolute deviation as summary statistic, the posterior probabilities concentrate near one or zero depending on whether or not the normal model is the true model.

The main result states that, under some “heavy-duty” assumptions, (a) if the “true” mean of the summary statistic can be recovered for both models under comparison, then the Bayes factor has the same asymptotic behaviour as n to the power -(d1 – d2)/2, irrespective of which one is the true model. (The dimensions d1 and  d2 are the effective dimensions of the asymptotic means of the summary statistic under both models.) Therefore, the Bayes factor always asymptotically selects the model having the smallest effective dimension and cannot be consistent. (b) if, instead, the “true” mean of the summary statistic cannot be represented in the other model, then the Bayes factor  is consistent. This means that, somehow, the best statistics to be used in an ABC approximation to a Bayes factor are ancillary statistics with different mean values under both models. Else, the summary statistic must have enough components to prohibit a parameter under the “wrong” model to meet the “true” mean of the summary statistic.

(As a striking coincidence, Hélene Massam and Géard Letac [re]posted today on arXiv a paper about the behaviour of the Bayes factor for contingency tables when the hyperparameter goes to zero, where they establish the consistency of the said Bayes factor under the sparser model. No Jeffreys-Lindley paradox in that case.)

## Normal tail precision

Posted in R, Statistics, University life with tags , , , , , on June 26, 2011 by xi'an

In conjunction with the normal-Laplace comparison mentioned in the most recent post about our lack of confidence in ABC model choice, we have been working on the derivation of the exact Bayes factor and I derived an easy formula for the marginal likelihood in the Laplace case that boils down to a weighted sum of normal probabilities (with somehow obvious notations)

$m(x_1,\ldots,x_n)=2^{-n/2}\,\sum_{i=0}^n\,e^{\sqrt{2}\sum_{j=1}^i x_j- \sqrt{2}\sum_{j=i+1}^n x_j+(n-2i)^2\sigma^2}$

$\qquad\qquad\times \left[ \Phi(\{x_{i+1}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) - \Phi(\{x_{i}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) \right]$

I then wrote a short R code that produced this marginal likelihood.

# ABC model comparison between Laplace and normal
# L(mu,V2) versus N(mu,1) with a prior N(0,2*2)
nobs=21
nsims=500
sqrtwo=sqrt(2)

marnor=function(smpl){
-0.5*nobs*log(2*pi)-0.5*(nobs-1)*var(smpl)+0.5*log(1+1/(4*nobs))-0.5*mean(smpl)^2/(4+1/nobs)}

marlap=function(sampl){
smpl=sort(sampl)
S=sum(smpl)
S=c(S,S-2*cumsum(smpl))
phi=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)))/2)
phip=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)+2))/2)
Dphi=log(c(phip[1],phip[-1]-phi[-nobs],1-phi[nobs]))
-0.5*nobs*log(2)+log(sum(exp(-sqrtwo*S+4*(nobs-2*(0:nobs))^2+Dphi)))
}


When checking it with an alternative Monte Carlo integration, Jean-Michel Marin spotted a slight but persistent numerical difference:

> test=sample(c(-1,1),nobs,rep=TRUE)*rexp(nobs,sqrt(2))
> exp(marlap(test))
[1] 3.074013e-10
> f=function(x){exp(-sqrt(2)*sum(abs(test-x)))*2^(-nobs/2)}
> mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 3.126421e-11


And while I thought it could be due to the simulation error, he persisted in analysing the problem until he found the reason: the difference between the normal cdfs in the above marginal was replaced by zero in the tails of the sample, while it contributed in a significant manner, due to the huge weights in front of those differences! He then rewrote the marlap function so that the difference was better computed in the tails, with a much higher level of agreement!

marlap=function(test){
sigma2=4
lulu=rep(0,nobs-1)
test=sort(test)
for (i in 1:(nobs-1)){
cst=sqrt(2)*(nobs-2*i)*sigma2
if (test[i]<0)
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE)+
log(1-exp(pnorm((test[i]-cst)/sqrt(sigma2),log=TRUE)-
pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE))))
else
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE)+
log(1-exp(pnorm(-(test[i+1]-cst)/sqrt(sigma2),log=TRUE)-
pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE))))
if (lulu[i]==0)
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+log(pnorm((test[i+1]-cst)/sqrt(sigma2))-
pnorm((test[i]-cst)/sqrt(sigma2))))
}
lulu0=exp(-sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
pnorm((test[1]-sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
lulun=exp(sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
pnorm(-(test[nobs]+sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
2^(-nobs/2)*sum(c(lulu0,lulu,lulun))
}


Here is an example of this agreement:

> marlap(test)
[1] 5.519428e-10
mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 5.540964e-10