## on approximations of Φ and Φ⁻¹

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on June 3, 2021 by xi'an

As I was working on a research project with graduate students, I became interested in fast and not necessarily very accurate approximations to the normal cdf Φ and its inverse. Reading through this 2010 paper of Richards et al., using for instance Polya’s

$F_0(x) =\frac{1}{2}(1+\sqrt{1-\exp(-2x^2/\pi)})$

(with another version replacing 2/π with the squared root of π/8) and

$F_2(x)=1/1+\exp(-1.5976x(1+0.04417x^2))$

not to mention a rational faction. All of which are more efficient (in R), if barely, than the resident pnorm() function.

      test replications elapsed relative user.self
3 logistic       100000   0.410    1.000     0.410
2    polya       100000   0.411    1.002     0.411
1 resident       100000   0.455    1.110     0.455


For the inverse cdf, the approximations there are involving numerical inversion except for

$F_0^{-1}(p) =(-\pi/2 \log[1-(2p-1)^2])^{\frac{1}{2}}$

which proves slightly faster than qnorm()

       test replications elapsed relative user.self
2 inv-polya       100000   0.401    1.000     0.401
1  resident       100000   0.450    1.000     0.450


## Normal tail precision

Posted in R, Statistics, University life with tags , , , , , on June 26, 2011 by xi'an

In conjunction with the normal-Laplace comparison mentioned in the most recent post about our lack of confidence in ABC model choice, we have been working on the derivation of the exact Bayes factor and I derived an easy formula for the marginal likelihood in the Laplace case that boils down to a weighted sum of normal probabilities (with somehow obvious notations)

$m(x_1,\ldots,x_n)=2^{-n/2}\,\sum_{i=0}^n\,e^{\sqrt{2}\sum_{j=1}^i x_j- \sqrt{2}\sum_{j=i+1}^n x_j+(n-2i)^2\sigma^2}$

$\qquad\qquad\times \left[ \Phi(\{x_{i+1}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) - \Phi(\{x_{i}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) \right]$

I then wrote a short R code that produced this marginal likelihood.

# ABC model comparison between Laplace and normal
# L(mu,V2) versus N(mu,1) with a prior N(0,2*2)
nobs=21
nsims=500
sqrtwo=sqrt(2)

marnor=function(smpl){
-0.5*nobs*log(2*pi)-0.5*(nobs-1)*var(smpl)+0.5*log(1+1/(4*nobs))-0.5*mean(smpl)^2/(4+1/nobs)}

marlap=function(sampl){
smpl=sort(sampl)
S=sum(smpl)
S=c(S,S-2*cumsum(smpl))
phi=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)))/2)
phip=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)+2))/2)
Dphi=log(c(phip[1],phip[-1]-phi[-nobs],1-phi[nobs]))
-0.5*nobs*log(2)+log(sum(exp(-sqrtwo*S+4*(nobs-2*(0:nobs))^2+Dphi)))
}


When checking it with an alternative Monte Carlo integration, Jean-Michel Marin spotted a slight but persistent numerical difference:

> test=sample(c(-1,1),nobs,rep=TRUE)*rexp(nobs,sqrt(2))
> exp(marlap(test))
[1] 3.074013e-10
> f=function(x){exp(-sqrt(2)*sum(abs(test-x)))*2^(-nobs/2)}
> mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 3.126421e-11


And while I thought it could be due to the simulation error, he persisted in analysing the problem until he found the reason: the difference between the normal cdfs in the above marginal was replaced by zero in the tails of the sample, while it contributed in a significant manner, due to the huge weights in front of those differences! He then rewrote the marlap function so that the difference was better computed in the tails, with a much higher level of agreement!

marlap=function(test){
sigma2=4
lulu=rep(0,nobs-1)
test=sort(test)
for (i in 1:(nobs-1)){
cst=sqrt(2)*(nobs-2*i)*sigma2
if (test[i]<0)
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE)+
log(1-exp(pnorm((test[i]-cst)/sqrt(sigma2),log=TRUE)-
pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE))))
else
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE)+
log(1-exp(pnorm(-(test[i+1]-cst)/sqrt(sigma2),log=TRUE)-
pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE))))
if (lulu[i]==0)
lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
(nobs-2*i)^2*sigma2+log(pnorm((test[i+1]-cst)/sqrt(sigma2))-
pnorm((test[i]-cst)/sqrt(sigma2))))
}
lulu0=exp(-sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
pnorm((test[1]-sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
lulun=exp(sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
pnorm(-(test[nobs]+sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
2^(-nobs/2)*sum(c(lulu0,lulu,lulun))
}


Here is an example of this agreement:

> marlap(test)
[1] 5.519428e-10
mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 5.540964e-10