**T**his semester. a group of Dauphine graduate students worked under my direction on simulation problems and resorted to using the Ziggurat method developed by George Marsaglia and Wai Wan Tsang, at about the time Devroye was completing his simulation bible. The algorithm covers the half-Normal density by 2², 2⁴, 2⁸, &tc., stripes, all but one rectangles and all with the same surface v. Generating uniformly from the tail strip means generating either uniformly from the rectangle part, x<r, or exactly from the Normal tail x>r, using a drifted exponential accept-reject. The choice between both does not require the surface of the rectangle but a single simulation y=vU/f(r). Furthermore, for the other rectangles, checking first that the first coordinate of the simulated point is less than the left boundary of the rectangle above avoids computing the density. This method is incredibly powerful, once the boundaries have been determined. With 2³² stripes, its efficiency is 99.3% acceptance rate. Compared with a fast algorithm by Ahrens & Dieter (1989), it is three times faster…

## Archive for pseudo-random generator

## piling up ziggurats

Posted in Books, pictures, Statistics, Travel with tags accept-reject algorithm, George Marsaglia, Maya, normal tail, pseudo-random generator, random simulation, ziggurat algorithm on June 7, 2021 by xi'an## R rexp()

Posted in Books, R, Statistics with tags Ahrens & Dieter, C, exponential distribution, John von Neumann, Luc Devroye, Non-Uniform Random Variate Generation, pseudo-random generator, rexp() on May 18, 2021 by xi'an**F**ollowing a question on X validated about the reasons for coding rexp() following Ahrens & Dieter (1972) version, I re-read Luc Devroye’s explanations. Which boils down to an optimised implementation of von Neumann’s Exponential generator. The central result is that, for any μ>0, M a Geometric variate with failure probability exp(-μ) and Z a positive Poisson variate with parameter μ

is distributed as an Exp(1) random variate. Meaning that for every scale μ, the integer part and the fractional part of an Exponential variate are independent, the former a Geometric. A refinement of the above consists in choosing

exp(-μ) =½

as the generation of M then consists in counting the number of 0’s before the first 1 in the binary expansion of U∼U(0,1). Actually the loop used in Ahrens & Dieter (1972) seems to be much less efficient than counting these 0’s

> benchmark("a"={u=runif(1) while(u<.5){ u=2*u F=F+log(2)}}, "b"={v=as.integer(rev(intToBits(2^31*runif(1)))) sum(cumprod(!v))}, "c"={sum(cumprod(sample(c(0,1),32,rep=T)))}, "g"={rgeom(1,prob=.5)},replications=1e4) test elapsed relative user.self 1 a 32.92 557.966 32.885 2 b 0.123 2.085 0.122 3 c 0.113 1.915 0.106 4 g 0.059 1.000 0.058

Obviously, trying to code the change directly in R resulted in much worse performances than the resident rexp(), coded in C.

## why does rbinom(1,1) differ from sample(0:1,1) with the same seed?

Posted in Statistics with tags C, cross validated, effective sample size, inverse cdf, pseudo-random generator, R, R version 3.4.4, random seed, rbinom(), sample on February 17, 2021 by xi'an```
> set.seed(1)
> rbinom(10,1,0.5)
[1] 0 0 1 1 0 1 1 1 1 0
```

```
> set.seed(1)
> sample(c(0,1), 10, replace = TRUE)
[1] 0 1 0 1 0 0 1 0 1 1
```

**T**his rather legitimate question was posted on X validated last week, the answer being that the C codes behind both functions do not use pseudo-random generators in the same manner. For instance, `rbinom`

does get involved beyond a mean value of 30 (and otherwise resorts to the inverse cdf approach). And following worries about sample biases, `sample`

was updated in 2019 (and also seems to resort to the inverse cdf when the mean is less than 200). However, when running the above code on my machine, still using the 2018 R version 3.4.4!, I recover the same outcome:

```
> set.seed(1)
> rbinom(10,1,0.5)
[1] 0 0 1 1 0 1 1 1 1 0
> set.seed(1)
> sample(c(0,1), 10, replace = TRUE)
[1] 0 0 1 1 0 1 1 1 1 0
```

> set.seed(1)
> qbinom(runif(10),1,0.5)
[1] 0 0 1 1 0 1 1 1 1 0

> set.seed(1)
> 1*(runif(10)>.5)
[1] 0 0 1 1 0 1 1 1 1 0

## not a Bernoulli factory

Posted in Books, Kids, pictures, R with tags Bernoulli factory, floor(), pseudo-random generator, puzzle, riddle, simulated annealing, Stack Exchange, The Riddler on May 20, 2020 by xi'an**A** Riddler riddle I possibly misunderstood:

Four isolated persons are given four fair coins, which can be either flipped once or returned without being flipped. If all flipped coins come up heads, the team wins! Else, if any comes up tails, or if no flip at all is done, it looses. Each person is further given an independent U(0,1) realisation. What is the best strategy?

Since the players are separated, I would presume the same procedure is used by all. Meaning that a coin is tossed with probability p, ie if the uniform is less than p, and untouched otherwise. The probability of winning is then

4(1-p)³p½+6(1-p)³p½²+4(1-p)p³½³+p⁴½⁴

which is maximum for p=0.3420391, with a winning probability of 0.2848424.

And an extra puzzle for free:

*solve x⌊x⌊x⌊x⌋⌋⌋=2020
*

Where the integral part is the integer immediately below x. Puzzle that I first fail solving by brute force, because I did not look at negative x’s… Since the fourth root of 2020 is between 6 and 7, the solution is either x=6+ε *or* x=-7+ε, with ε in (0,1). The puzzle then becomes either

(6+ε)⌊(6+ε)⌊(6+ε)⌊6+ε⌋⌋⌋ = (6+ε)⌊(6+ε)⌊36+6ε⌋⌋ = (6+ε)⌊(6+ε)(36+⌊6ε⌋)⌋ = 2020

where there are 6 possible integer values for ⌊6ε⌋, with only ⌊6ε⌋=5 being possible, turning the equation into

(6+ε)⌊41(6+ε)⌋ = (6+ε)(246+⌊41ε⌋) = 2020

where again only ⌊42ε⌋=40 being possible, ending up with

1716+286ε = 2020

which has no solution in (0,1). In the second case

(-7+ε)⌊(-7+ε)⌊(-7+ε)⌊-7+ε⌋⌋⌋ = (-7+ε)⌊(-7+ε)(49+⌊-7ε⌋)⌋ = 2020

shows that only ⌊-7ε⌋=-3 is possible, leading to

(-7+ε)⌊46(-7+ε))⌋ = (-7+ε) (-322+⌊46ε⌋)=2020

with only ⌊46ε⌋=17 possible, hence

2135-305ε=2020

and

ε=115/305.

A brute force simulated annealing resolution returns x=-6.622706 after 10⁸ iterations. A more interesting question is to figure out the discontinuity points of the function

ℵ(x) = x⌊x⌊x⌊x⌋⌋⌋

as they seem to be numerous:

For instance, only 854 of the first 2020 integers enjoy a solution to ℵ(x)=n.