## simulation by inverse cdf

**A**nother Cross Validated forum question that led me to an interesting (?) reconsideration of certitudes! When simulating from a normal distribution, is Box-Muller algorithm better or worse than using the inverse cdf transform? My first reaction was to state that Box-Muller was exact while the inverse cdf relied on the coding of the inverse cdf, like *qnorm()* in R. Upon reflection and commenting by other members of the forum, like William Huber, I came to moderate this perspective since Box-Muller also relies on transcendental functions like *sin* and *log*, hence writing

also involves approximating in the coding of those functions. While it is feasible to avoid the call to trigonometric functions (see, e.g., Algorithm A.8 in our book), the call to the logarithm seems inescapable. So it ends up with the issue of which of the two functions is better coded, both in terms of speed and precision. Surprisingly, when coding in R, the inverse cdf may be the winner: here is the comparison I ran at the time I wrote my comments

> system.time(qnorm(runif(10^8))) sutilisateur système écoulé 10.137 0.120 10.251 > system.time(rnorm(10^8)) utilisateur système écoulé 13.417 0.060 13.472`

However re-rerunning it today, I get opposite results (pardon my French, I failed to turn the messages to English):

> system.time(qnorm(runif(10^8))) utilisateur système écoulé 10.137 0.144 10.274 > system.time(rnorm(10^8)) utilisateur système écoulé 7.894 0.060 7.948

(There is coherence in the system time, which shows *rnorm* as twice as fast as the call to *qnorm*.) In terms, of precision, I could not spot a divergence from normality, either through a ks.test over 10⁸ simulations or in checking the tails:

“Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”. Luc Devroye, Non-uniform random variate generation, 1985

** Update:** As pointed out by Radford Neal in his comment, the above comparison is meaningless because the function

*rnorm*() is by default based on the inversion of

*qnorm*()! As indicated by Alexander Blocker in another comment, to use an other generator requires calling RNG as in

RNGkind(normal.kind = “Box-Muller”)

(And thanks to Jean-Louis Foulley for salvaging this quote from Luc Devroye, which does not appear to apply to the current coding of the Gaussian inverse cdf.)

January 14, 2015 at 4:38 pm

Your experiment is based on a false premise. If you look at help(RNG), you’ll see that by default rnorm uses inversion of the CDF to generate normal variates. So your whole comparison is just of overhead for doing the same thing with different code.

I was surprised when I noticed this myself. Presumably techniques for computing the normal quantile function have improved over the years. It does make it easier to get good coalescence from coupling methods!

Radford

January 14, 2015 at 6:43 pm

Ah! This is an awfully good point, Radford, many thanks! I will mention this in the main post as it simply nullify the comparison. Alexander in his earlier comment explicitly made use of Box-Muller and of Ahrens-Dieter but both methods are slower than the inversion.

January 14, 2015 at 3:37 pm

Thanks Christian for reminding us the basic simulation tools we use so often without caring about them or even knowing them. I am a little old fashioned and use the polar method that avoids trigonometric functions but not the log as you mentioned it.

For quantile normal calculation, I used a simple Newton Raphson algorithm deltaX=(deltaP)/f(X) where P=CDF(X) and f(x) is the density in conjunction with an approximation of CDF given in Abramowitz and Stegun (26.2.17).

Regarding the comparison between the two methods, I went back to Devroye’s book (Non uniform random variate generation). He says on page 380 “Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”.

It would be useful to have further comments and advice about what to use today to generate normal variates.

Thanks again.

JLF

January 14, 2015 at 6:12 pm

Thank you for the quote from Luc, Jean-Louis!

January 14, 2015 at 4:32 am

I think you’re seeing a mix of function call overhead and algorithmic performance. See:

> system.time(qnorm(runif(10^8)))

user system elapsed

8.246 0.579 9.735

> system.time(rnorm(10^8))

user system elapsed

8.332 0.292 8.635

> RNGkind(normal.kind = “Box-Muller”)

> system.time(rnorm(10^8))

user system elapsed

12.364 0.312 12.689

> RNGkind(normal.kind = “Ahrens-Dieter”)

> system.time(rnorm(10^8))

user system elapsed

5.696 0.280 5.982