## D65536 [xkcd]

Posted in Books, Kids with tags , , , on June 16, 2022 by xi'an

## What are the chances of that?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on May 13, 2022 by xi'an

What are the chances that I review a book with this title, a few months after reviewing a book called What is luck?! This one is written by Andrew Elliott, whose Is that a big number? I reviewed a wee bit earlier… And that the cover of this book involves a particularly unlucky sequence of die as in my much earlier review of Krysz Burdzy’s book? (About 10⁻⁶ less likely than the likeliest draw!)

The (relative) specificity of this book is to try to convey the notions of chance and uncertainty to the general public, more in demonstrating that our intuition is most often wrong by examples and simulations, than in delving into psychological reasons as in Barbara Blatchley’s book. The author advances five dualities that underly our (dysfunctional) relation to chance: individual vs. collective, randomness vs. meaning, foresight vs. insight, uniformity vs. variability, and disruption vs. opportunity.

“News programmes clearly understand that the testimonies of individuals draw better audiences than the summaries of statisticians.” (p. xvii)

Some of the nice features of the book  are (a) the description of a probabilistic problem at the beginning of each chapter, to be solved at the end, (b) the use of simulation experiments, represented by coloured pixels over a grey band crossing the page, including a section on pseudorandom generators [which is less confusing that the quote below may indicate!], (c) taking full advantage of the quincunx apparatus, and (d) very few apologies for getting into formulas. And even a relevant quote of Taleb’s Black Swan about the ludic fallacy. On the other hand, the author spends quite a large component of the book on chance games, exhibiting a ludic tendency! And contemplates biased coins, while he should know better! The historical sections may prove too much for both informed and uninformed readers. (However, I learned that the UK Government had used a form of lottery to pay interests on premium bonds.) And the later parts are less numerical and quantified, even though the author brings in the micromort measurement [invented by Ronald Howard and] favoured by David Spiegelhalter. Who actually appears to have inspired several other sections, like the one on coincidences (which remains quite light in its investigation!). I finished the book rather quickly by browsing though mostly anecdotes and a lesser feel of a unified discourse. I did not find the attempt to link with the COVID pandemic, which definitely resets our clocks on risk, particularly alluring…

“People go to a lot of trouble to generate truly random numbers—sequences that are impossible to predict.” (p.66)

The apparition of the Normal distribution is somewhat overdone and almost mystical, if the tone gets more reasonable by the end of the corresponding chapter.

“…combining random numbers from distributions that really have no business being added together (…) ends up with a statistic that actually fits the normal distribution quite well.” (p.83)

The part about Bayes and Bayesian reasoning does not include any inference, with a rather duh! criticism of prior modelling.

“If you are tempted to apply a group statistic derived from a broad analysis to a more narrow purpose, you run the risk of making an unfair judgement.” (p.263)

The section about Xenakis’ musical creations as a Markov process was most interesting (and novel to me). I also enjoyed the shared cultural entries, esp. literary ones. Like citing the recent Chernobyl TV drama. Or Philip K. Dick’s Do Androids Dream of Electric Sheep? Or yet Monty Python’s Life of Brian. Overall, there is enough trivia and engagement to keep reading the book till its end!

## piling up ziggurats

Posted in Books, pictures, Statistics, Travel with tags , , , , , , on June 7, 2021 by xi'an

This semester. a group of Dauphine graduate students worked under my direction on simulation problems and resorted to using the Ziggurat method developed by George Marsaglia and Wai Wan Tsang, at about the time Devroye was completing his simulation bible. The algorithm covers the half-Normal density by 2², 2⁴, 2⁸, &tc., stripes, all but one rectangles and all with the same surface v. Generating uniformly from the tail strip means generating either uniformly from the rectangle part, x<r, or exactly from the Normal tail x>r, using a drifted exponential accept-reject. The choice between both does not require the surface of the rectangle but a single simulation y=vU/f(r). Furthermore, for the other rectangles, checking first that the first coordinate of the simulated point is less than the left boundary of the rectangle above avoids computing the density. This method is incredibly powerful, once the boundaries have been determined. With 2³² stripes, its efficiency is 99.3% acceptance rate. Compared with a fast algorithm by Ahrens & Dieter (1989), it is three times faster…

## R rexp()

Posted in Books, R, Statistics with tags , , , , , , , on May 18, 2021 by xi'an

Following a question on X validated about the reasons for coding rexp() following Ahrens & Dieter (1972) version, I re-read Luc Devroye’s explanations. Which boils down to an optimised implementation of von Neumann’s Exponential generator. The central result is that, for any μ>0, M a Geometric variate with failure probability exp(-μ) and Z a positive Poisson variate with parameter μ

$\mu(M+\min(U_1,\ldots,U_Z))$

is distributed as an Exp(1) random variate. Meaning that for every scale μ, the integer part and the fractional part of an Exponential variate are independent, the former a Geometric. A refinement of the above consists in choosing

exp(-μ) =½

as the generation of M then consists in counting the number of $$0’s$$ before the first $$1$$ in the binary expansion of $$U∼U(0,1)$$. Actually the loop used in Ahrens & Dieter (1972) seems to be much less efficient than counting these 0’s

> benchmark("a"={u=runif(1)
while(u<.5){
u=2*u
F=F+log(2)}},
"b"={v=as.integer(rev(intToBits(2^31*runif(1))))
sum(cumprod(!v))},
"c"={sum(cumprod(sample(c(0,1),32,rep=T)))},
"g"={rgeom(1,prob=.5)},replications=1e4)
test elapsed relative user.self
1    a  32.92  557.966    32.885
2    b  0.123    2.085     0.122
3    c  0.113    1.915     0.106
4    g  0.059    1.000     0.058


Obviously, trying to code the change directly in R resulted in much worse performances than the resident rexp(), coded in C.

## why does rbinom(1,1) differ from sample(0:1,1) with the same seed?

Posted in Statistics with tags , , , , , , , , , on February 17, 2021 by xi'an
> set.seed(1)
> rbinom(10,1,0.5)
[1] 0 0 1 1 0 1 1 1 1 0
> set.seed(1)
> sample(c(0,1), 10, replace = TRUE)
[1] 0 1 0 1 0 0 1 0 1 1

This rather legitimate question was posted on X validated last week, the answer being that the C codes behind both functions do not use pseudo-random generators in the same manner. For instance, rbinom does get involved beyond a mean value of 30 (and otherwise resorts to the inverse cdf approach). And following worries about sample biases, sample was updated in 2019 (and also seems to resort to the inverse cdf when the mean is less than 200). However, when running the above code on my machine, still using the 2018 R version 3.4.4!, I recover the same outcome:

> set.seed(1)
> rbinom(10,1,0.5)
[1] 0 0 1 1 0 1 1 1 1 0

> set.seed(1)
> sample(c(0,1), 10, replace = TRUE)
[1] 0 0 1 1 0 1 1 1 1 0> set.seed(1)
> qbinom(runif(10),1,0.5)
[1] 0 0 1 1 0 1 1 1 1 0
> set.seed(1)
> 1*(runif(10)>.5)
[1] 0 0 1 1 0 1 1 1 1 0