Archive for pnorm

on approximations of Φ and Φ⁻¹

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on June 3, 2021 by xi'an

As I was working on a research project with graduate students, I became interested in fast and not necessarily very accurate approximations to the normal cdf Φ and its inverse. Reading through this 2010 paper of Richards et al., using for instance Polya’s

F_0(x) =\frac{1}{2}(1+\sqrt{1-\exp(-2x^2/\pi)})

(with another version replacing 2/π with the squared root of π/8) and

F_2(x)=1/1+\exp(-1.5976x(1+0.04417x^2))

not to mention a rational faction. All of which are more efficient (in R), if barely, than the resident pnorm() function.

      test replications elapsed relative user.self 
3 logistic       100000   0.410    1.000     0.410 
2    polya       100000   0.411    1.002     0.411 
1 resident       100000   0.455    1.110     0.455 

For the inverse cdf, the approximations there are involving numerical inversion except for

F_0^{-1}(p) =(-\pi/2 \log[1-(2p-1)^2])^{\frac{1}{2}}

which proves slightly faster than qnorm()

       test replications elapsed relative user.self 
2 inv-polya       100000   0.401    1.000     0.401
1  resident       100000   0.450    1.000     0.450

Normal tail precision

Posted in R, Statistics, University life with tags , , , , , on June 26, 2011 by xi'an

In conjunction with the normal-Laplace comparison mentioned in the most recent post about our lack of confidence in ABC model choice, we have been working on the derivation of the exact Bayes factor and I derived an easy formula for the marginal likelihood in the Laplace case that boils down to a weighted sum of normal probabilities (with somehow obvious notations)

m(x_1,\ldots,x_n)=2^{-n/2}\,\sum_{i=0}^n\,e^{\sqrt{2}\sum_{j=1}^i x_j- \sqrt{2}\sum_{j=i+1}^n x_j+(n-2i)^2\sigma^2}

\qquad\qquad\times \left[ \Phi(\{x_{i+1}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) - \Phi(\{x_{i}-\sqrt{2}(n-2i)\sigma^2\}/\sigma) \right]

I then wrote a short R code that produced this marginal likelihood.

# ABC model comparison between Laplace and normal
# L(mu,V2) versus N(mu,1) with a prior N(0,2*2)
nobs=21
nsims=500
sqrtwo=sqrt(2)

marnor=function(smpl){
  -0.5*nobs*log(2*pi)-0.5*(nobs-1)*var(smpl)+0.5*log(1+1/(4*nobs))-0.5*mean(smpl)^2/(4+1/nobs)}

marlap=function(sampl){
  smpl=sort(sampl)
  S=sum(smpl)
  S=c(S,S-2*cumsum(smpl))
  phi=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)))/2)
  phip=pnorm((smpl-sqrtwo*4*(nobs-2*(1:nobs)+2))/2)
  Dphi=log(c(phip[1],phip[-1]-phi[-nobs],1-phi[nobs]))
   -0.5*nobs*log(2)+log(sum(exp(-sqrtwo*S+4*(nobs-2*(0:nobs))^2+Dphi)))
  }

When checking it with an alternative Monte Carlo integration, Jean-Michel Marin spotted a slight but persistent numerical difference:

> test=sample(c(-1,1),nobs,rep=TRUE)*rexp(nobs,sqrt(2))
> exp(marlap(test))
[1] 3.074013e-10
> f=function(x){exp(-sqrt(2)*sum(abs(test-x)))*2^(-nobs/2)}
> mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 3.126421e-11

And while I thought it could be due to the simulation error, he persisted in analysing the problem until he found the reason: the difference between the normal cdfs in the above marginal was replaced by zero in the tails of the sample, while it contributed in a significant manner, due to the huge weights in front of those differences! He then rewrote the marlap function so that the difference was better computed in the tails, with a much higher level of agreement!

marlap=function(test){
  sigma2=4
  lulu=rep(0,nobs-1)
  test=sort(test)
  for (i in 1:(nobs-1)){
    cst=sqrt(2)*(nobs-2*i)*sigma2
    if (test[i]<0)
      lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
       (nobs-2*i)^2*sigma2+pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE)+
       log(1-exp(pnorm((test[i]-cst)/sqrt(sigma2),log=TRUE)-
       pnorm((test[i+1]-cst)/sqrt(sigma2),log=TRUE))))
    else
      lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
       (nobs-2*i)^2*sigma2+pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE)+
       log(1-exp(pnorm(-(test[i+1]-cst)/sqrt(sigma2),log=TRUE)-
       pnorm(-(test[i]-cst)/sqrt(sigma2),log=TRUE))))
    if (lulu[i]==0)
       lulu[i]=exp(sqrt(2)*sum(test[1:i])-sqrt(2)*sum(test[(i+1):nobs])+
          (nobs-2*i)^2*sigma2+log(pnorm((test[i+1]-cst)/sqrt(sigma2))-
           pnorm((test[i]-cst)/sqrt(sigma2))))
    }
 lulu0=exp(-sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
   pnorm((test[1]-sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
 lulun=exp(sqrt(2)*sum(test[1:nobs])+nobs^2*sigma2+
    pnorm(-(test[nobs]+sqrt(2)*nobs*sigma2)/sqrt(sigma2),log=TRUE))
 2^(-nobs/2)*sum(c(lulu0,lulu,lulun))
 }

Here is an example of this agreement:

> marlap(test)
[1] 5.519428e-10
mean(apply(as.matrix(2*rnorm(10^6)),1,f))
[1] 5.540964e-10