simulation by inverse cdf

Another Cross Validated forum question that led me to an interesting (?) reconsideration of certitudes! When simulating from a normal distribution, is Box-Muller algorithm better or worse than using the inverse cdf transform? My first reaction was to state that Box-Muller was exact while the inverse cdf relied on the coding of the inverse cdf, like qnorm() in R. Upon reflection and commenting by other members of the forum, like William Huber, I came to moderate this perspective since Box-Muller also relies on transcendental functions like sin and log, hence writing

X=\sqrt{-2\log(U_1)}\,\sin(2\pi U_2)

also involves approximating in the coding of those functions. While it is feasible to avoid the call to trigonometric functions (see, e.g., Algorithm A.8 in our book), the call to the logarithm seems inescapable. So it ends up with the issue of which of the two functions is better coded, both in terms of speed and precision. Surprisingly, when coding in R, the inverse cdf may be the winner: here is the comparison I ran at the time I wrote my comments

> system.time(qnorm(runif(10^8)))
sutilisateur     système      écoulé
 10.137           0.120      10.251
> system.time(rnorm(10^8))
utilisateur     système      écoulé
 13.417           0.060      13.472`

However re-rerunning it today, I get opposite results (pardon my French, I failed to turn the messages to English):

> system.time(qnorm(runif(10^8)))
utilisateur     système      écoulé
     10.137       0.144      10.274
> system.time(rnorm(10^8))
utilisateur     système      écoulé
      7.894       0.060       7.948

(There is coherence in the system time, which shows rnorm as twice as fast as the call to qnorm.) In terms, of precision, I could not spot a divergence from normality, either through a ks.test over 10⁸ simulations or in checking the tails:

“Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”. Luc Devroye, Non-uniform random variate generation, 1985

Update: As pointed out by Radford Neal in his comment, the above comparison is meaningless because the function rnorm() is by default based on the inversion of qnorm()! As indicated by Alexander Blocker in another comment, to use an other generator requires calling RNG as in

RNGkind(normal.kind = “Box-Muller”)

(And thanks to Jean-Louis Foulley for salvaging this quote from Luc Devroye, which does not appear to apply to the current coding of the Gaussian inverse cdf.)

5 Responses to “simulation by inverse cdf”

  1. Radford Neal Says:

    Your experiment is based on a false premise. If you look at help(RNG), you’ll see that by default rnorm uses inversion of the CDF to generate normal variates. So your whole comparison is just of overhead for doing the same thing with different code.

    I was surprised when I noticed this myself. Presumably techniques for computing the normal quantile function have improved over the years. It does make it easier to get good coalescence from coupling methods!

    Radford

    • Ah! This is an awfully good point, Radford, many thanks! I will mention this in the main post as it simply nullify the comparison. Alexander in his earlier comment explicitly made use of Box-Muller and of Ahrens-Dieter but both methods are slower than the inversion.

  2. Jean-Louis Foulley Says:

    Thanks Christian for reminding us the basic simulation tools we use so often without caring about them or even knowing them. I am a little old fashioned and use the polar method that avoids trigonometric functions but not the log as you mentioned it.
    For quantile normal calculation, I used a simple Newton Raphson algorithm deltaX=(deltaP)/f(X) where P=CDF(X) and f(x) is the density in conjunction with an approximation of CDF given in Abramowitz and Stegun (26.2.17).
    Regarding the comparison between the two methods, I went back to Devroye’s book (Non uniform random variate generation). He says on page 380 “Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”.
    It would be useful to have further comments and advice about what to use today to generate normal variates.
    Thanks again.
    JLF

  3. I think you’re seeing a mix of function call overhead and algorithmic performance. See:

    > system.time(qnorm(runif(10^8)))
    user system elapsed
    8.246 0.579 9.735
    > system.time(rnorm(10^8))
    user system elapsed
    8.332 0.292 8.635
    > RNGkind(normal.kind = “Box-Muller”)
    > system.time(rnorm(10^8))
    user system elapsed
    12.364 0.312 12.689
    > RNGkind(normal.kind = “Ahrens-Dieter”)
    > system.time(rnorm(10^8))
    user system elapsed
    5.696 0.280 5.982

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s