## F(1-F)

Posted in Books, Kids, Statistics with tags , , , on March 9, 2022 by xi'an

When answering an X validated question about the covariance between a random variable X and its cdf transform F(X), I realised that it was half the integral of the function

x → F(x)(1-F(x))

when X is centred. It is not surprising in the least to see the cdf appearing for this second order expectation, since it can similarly be used to represent first order expectations (as exploited by nested sampling). But it is easy to be confused by the fact that F(X) is usually a Uniform (0,1) variate hence distribution-free, until one sees it remains positively correlated with X, or by the apparent lack of scale or by the symmetry, until one realises this is not the case. (The associated correlation is scale-free.)

## on approximations of Φ and Φ⁻¹

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on June 3, 2021 by xi'an

As I was working on a research project with graduate students, I became interested in fast and not necessarily very accurate approximations to the normal cdf Φ and its inverse. Reading through this 2010 paper of Richards et al., using for instance Polya’s

$F_0(x) =\frac{1}{2}(1+\sqrt{1-\exp(-2x^2/\pi)})$

(with another version replacing 2/π with the squared root of π/8) and

$F_2(x)=1/1+\exp(-1.5976x(1+0.04417x^2))$

not to mention a rational faction. All of which are more efficient (in R), if barely, than the resident pnorm() function.

      test replications elapsed relative user.self
3 logistic       100000   0.410    1.000     0.410
2    polya       100000   0.411    1.002     0.411
1 resident       100000   0.455    1.110     0.455


For the inverse cdf, the approximations there are involving numerical inversion except for

$F_0^{-1}(p) =(-\pi/2 \log[1-(2p-1)^2])^{\frac{1}{2}}$

which proves slightly faster than qnorm()

       test replications elapsed relative user.self
2 inv-polya       100000   0.401    1.000     0.401
1  resident       100000   0.450    1.000     0.450


## open problem

Posted in R, Statistics with tags , , , , , on October 24, 2013 by xi'an

On the plane back from Warwick, I was reading an ABC arXived paper by Umberto Picchini and Julie Forman, “Accelerating inference for diffusions observed with measurement error and large sample sizes using Approximate Bayesian Computation: A case study” and came upon this open problem:

“A closed-form expression for generating percentiles from a fi nite-components Gaussian mixture is unavailable.” (p.5)

which means solving

$\alpha\Phi(x)+(1-\alpha)\Phi(\{x-\mu)/\sigma) = \beta$

is not possible in closed form. (Of course it could also be argued that the equation Φ(x)=β is unavailable in closed-form ie that the analytic solution x=Φ-1(β) is formal…) While I can think of several numerical approaches, a few minutes with a sheet of paper let me convinced that indeed this is not solvable (hence not an open problem, contrary to the title of the post!).

Just for R practice (and my R course students!), here is a basic R code:

mixant=function(alpha=0.5,beta=0.95,mu,sig=1,prec=1/10^4){
onmal=1-alpha
qbeta=qnorm(beta)

# initial bounds
omb=min(qbeta,mu+sig*qbeta)
omB=max(qbeta,mu+sig*qbeta)
if (beta<alpha){
omB=min(omB,qnorm(beta/alpha))
}else{
omb=max(omb,mu+sig*qnorm((beta-alpha)/onmal))}
if (beta<onmal){
omB=min(omB,mu+sig*qnorm(beta/onmal))
}else{
omb=max(omb,qnorm((beta-onmal)/alpha))}

# iterations
for (t in 1:5){
ranj=seq(omb,omB,len=17)
cfs=alpha*pnorm(ranj)+onmal*pnorm((ranj-mu)/sig)
omb=max(ranj[cfs<=beta])
omB=min(ranj[cfs>=beta])

if ((omB-omb)<prec)
break()}
return(.5*(omb+omB))}