## piling up ziggurats

Posted in Books, pictures, Statistics, Travel with tags , , , , , , on June 7, 2021 by xi'an

This semester. a group of Dauphine graduate students worked under my direction on simulation problems and resorted to using the Ziggurat method developed by George Marsaglia and Wai Wan Tsang, at about the time Devroye was completing his simulation bible. The algorithm covers the half-Normal density by 2², 2⁴, 2⁸, &tc., stripes, all but one rectangles and all with the same surface v. Generating uniformly from the tail strip means generating either uniformly from the rectangle part, x<r, or exactly from the Normal tail x>r, using a drifted exponential accept-reject. The choice between both does not require the surface of the rectangle but a single simulation y=vU/f(r). Furthermore, for the other rectangles, checking first that the first coordinate of the simulated point is less than the left boundary of the rectangle above avoids computing the density. This method is incredibly powerful, once the boundaries have been determined. With 2³² stripes, its efficiency is 99.3% acceptance rate. Compared with a fast algorithm by Ahrens & Dieter (1989), it is three times faster…

## Normal tail precision

Posted in R, Statistics, University life with tags , , , , , on June 26, 2011 by xi'an

In conjunction with the normal-Laplace comparison mentioned in the most recent post about our lack of confidence in ABC model choice, we have been working on the derivation of the exact Bayes factor and I derived an easy formula for the marginal likelihood in the Laplace case that boils down to a weighted sum of normal probabilities (with somehow obvious notations)

$m(x_1,\ldots,x_n)=2^{-n/2}\,\sum_{i=0}^n\,e^{\sqrt{2}\sum_{j=1}^i x_j- \sqrt{2}\sum_{j=i+1}^n x_j+(n-2i)^2\sigma^2}$

$\qquad\qquad\times \left[ \Phi(\{x_{i+1}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) - \Phi(\{x_{i}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) \right]$

I then wrote a short R code that produced this marginal likelihood.

# ABC model comparison between Laplace and normal
# L(mu,V2) versus N(mu,1) with a prior N(0,2*2)
nobs=21
nsims=500
sqrtwo=sqrt(2)

marnor=function(smpl){
-0.5*nobs*log(2*pi)-0.5*(nobs-1)*var(smpl)+0.5*log(1+1/(4*nobs))-0.5*mean(smpl)^2/(4+1/nobs)}

marlap=function(sampl){
smpl=sort(sampl)
S=sum(smpl)
S=c(S,S-2*cumsum(smpl))
phi=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)))/2)
phip=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)+2))/2)
Dphi=log(c(phip[1],phip[-1]-phi[-nobs],1-phi[nobs]))
-0.5*nobs*log(2)+log(sum(exp(-sqrtwo*S+4*(nobs-2*(0:nobs))^2+Dphi)))
}


When checking it with an alternative Monte Carlo integration, Jean-Michel Marin spotted a slight but persistent numerical difference:

> test=sample(c(-1,1),nobs,rep=TRUE)*rexp(nobs,sqrt(2))
> exp(marlap(test))
[1] 3.074013e-10
> f=function(x){exp(-sqrt(2)*sum(abs(test-x)))*2^(-nobs/2)}
> mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 3.126421e-11


And while I thought it could be due to the simulation error, he persisted in analysing the problem until he found the reason: the difference between the normal cdfs in the above marginal was replaced by zero in the tails of the sample, while it contributed in a significant manner, due to the huge weights in front of those differences! He then rewrote the marlap function so that the difference was better computed in the tails, with a much higher level of agreement!

marlap=function(test){
sigma2=4
lulu=rep(0,nobs-1)
test=sort(test)
for (i in 1:(nobs-1)){
cst=sqrt(2)*(nobs-2*i)*sigma2
if (test[i]<0)
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE)+
log(1-exp(pnorm((test[i]-cst)/sqrt(sigma2),log=TRUE)-
pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE))))
else
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE)+
log(1-exp(pnorm(-(test[i+1]-cst)/sqrt(sigma2),log=TRUE)-
pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE))))
if (lulu[i]==0)
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+log(pnorm((test[i+1]-cst)/sqrt(sigma2))-
pnorm((test[i]-cst)/sqrt(sigma2))))
}
lulu0=exp(-sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
pnorm((test[1]-sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
lulun=exp(sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
pnorm(-(test[nobs]+sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
2^(-nobs/2)*sum(c(lulu0,lulu,lulun))
}


Here is an example of this agreement:

> marlap(test)
[1] 5.519428e-10
mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 5.540964e-10