## why is this algorithm simulating a Normal variate?

Posted in Books, Kids, R, Statistics with tags , , , , , , , on September 15, 2022 by xi'an

A backward question from X validated as to why the above is a valid Normal generator based on exponential generations. Which can be found in most textbooks (if not ours). And in The Bible, albeit as an exercise. The validation proceeds from the (standard) Exponential density dominating the (standard) Normal density and, according to Devroye, may have originated from von Neumann himself. But with a brilliant reverse engineering resolution by W. Huber on X validated. While a neat exercise, it requires on average 2.64 Uniform generations per Normal generation, against a 1/1 ratio for Box-Muller (1958) polar approach, or 1/0.86 for the Marsaglia-Bray (1964) composition-rejection method. The apex of the simulation jungle is however Marsaglia and Tsang (2000) ziggurat algorithm. At least on CPUs since, Note however that “The ziggurat algorithm gives a more efficient method for scalar processors (e.g. old CPUs), while the Box–Muller transform is superior for processors with vector units (e.g. GPUs or modern CPUs)” according to Wikipedia.

To draw a comparison between this Normal generator (that I will consider as von Neumann’s) and the Box-Müller polar generator,

#Box-Müller
bm=function(N){
a=sqrt(-2*log(runif(N/2)))
b=2*pi*runif(N/2)
return(c(a*sin(b),a*cos(b)))
}

#vonNeumann
vn=function(N){
u=-log(runif(2.64*N))
v=-2*log(runif(2.64*N))>(u-1)^2
w=(runif(2.64*N)<.5)-2
return((w*u)[v])
}


here are the relative computing times

> system.time(bm(1e8))
utilisateur     système      écoulé
7.015       0.649       7.674
> system.time(vn(1e8))
utilisateur     système      écoulé
42.483       5.713      48.222


## stuck exchange

Posted in Books, Kids, Statistics with tags , , , , , , , on August 16, 2022 by xi'an

Made an attempt at explaining on X validated why simulating from the joint was equivalent to simulating from the marginal then from the conditional. Unfortunately failed as I could not fathom where the OP’s difficulty was. It seems it started at defining what drawing from a distribution meant… Then someone came by asking why I was writing the exponential in this unusual way (this was a barred E for expectation) and whether or not the “thin hollow rectangle” (a barred I for indicator) was standing for identity, that is

$\mathbb E\quad\text{and}\quad \mathbb I$

Reaching a point of incomprehension from which I could not recover…

## simulating from the joint cdf

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , on July 13, 2022 by xi'an

An X validated question (what else?!) brought back (to me) the question of handling a bivariate cdf for simulation purposes. In the specific case of a copula when thus marginals were (well-)known…. And led me to an erroneous chain of thought, fortunately rescued by Robin Ryder! When the marginal distributions are set, the simulation setup is indeed equivalent to a joint Uniform simulation from a copula

$\mathbb P[U_1\leq u_1,U_2\leq u_2,\dots,U_d\leq u_d]=C(u_1,u_2,\dots,u_d)$

In specific cases, as for instance the obvious example of Gaussian copulas, there exist customised simulation algorithms. Looking for more generic solutions, I turn to the Bible, where Chapter XI[an], has two entire sections XI.3.2. and XI.3.3 on the topic (even though Luc Devroye does not use the term copula there despite them being introduced in 1959 by A, Sklar, in response to a query of M. Fréchet). In addition to a study of copulas, both sections contain many specific solutions (as for instance in the [unnumbered] Table on page 585) but I found no generic simulation method. My [non-selected] answer to the question was thus to propose standard solutions such as finding one conditional since the marginals are Uniform. Which depends on the tractability of the derivatives of C(·,·).

However, being dissatisfied with this bland answer, I thought further about the problem and came up with a fallacious scheme, namely to first simulate the value p of C(U,V) by drawing a Uniform, and second simulate (U,V) conditional on C(U,V)=p. Going as far as running an R code on a simple copula, as shown above. Fallacious reasoning since (as I knew already!!!), C(U,V) is not uniformly distributed! But has instead a case-dependent distribution… As a (connected) aside, I wonder if the generator attached with Archimedean copulas has any magical feature that help with the generation of the associated copula.

## Bayes Rules! [book review]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 5, 2022 by xi'an

Bayes Rules! is a new introductory textbook on Applied Bayesian Model(l)ing, written by Alicia Johnson (Macalester College), Miles Ott (Johnson & Johnson), and Mine Dogucu (University of California Irvine). Textbook sent to me by CRC Press for review. It is available (free) online as a website and has a github site, as well as a bayesrule R package. (Which reminds me that both our own book R packages, bayess and mcsm, have gone obsolete on CRAN! And that I should find time to figure out the issue for an upgrading…)

As far as I can tell [from abroad and from only teaching students with a math background], Bayes Rules! seems to be catering to early (US) undergraduate students with very little exposure to mathematical statistics or probability, as it introduces basic probability notions like pmf, joint distribution, and Bayes’ theorem (as well as Greek letters!) and shies away from integration or algebra (a covariance matrix occurs on page 437 with a lot . For instance, the Normal-Normal conjugacy derivation is considered a “mouthful” (page 113). The exposition is somewhat stretched along the 500⁺ pages as a result, imho, which is presumably a feature shared with most textbooks at this level, and, accordingly, the exercises and quizzes are more about intuition and reproducing the contents of the chapter than technical. In fact, I did not spot there a mention of sufficiency, consistency, posterior concentration (almost made on page 113), improper priors, ergodicity, irreducibility, &tc., while other notions are not precisely defined, like ESS, weakly informative (page 234) or vague priors (page 77), prior information—which makes the negative answer to the quiz “All priors are informative”  (page 90) rather confusing—, R-hat, density plot, scaled likelihood, and more.

As an alternative to “technical derivations” Bayes Rules! centres on intuition and simulation (yay!) via its bayesrule R package. Itself relying on rstan. Learning from example (as R code is always provided), the book proceeds through conjugate priors, MCMC (Metropolis-Hasting) methods, regression models, and hierarchical regression models. Quite impressive given the limited prerequisites set by the authors. (I appreciated the representations of the prior-likelihood-posterior, especially in the sequential case.)

Regarding the “hot tip” (page 108) that the posterior mean always stands between the prior mean and the data mean, this should be made conditional on a conjugate setting and a mean parameterisation. Defining MCMC as a method that produces a sequence of realisations that are not from the target makes a point, except of course that there are settings where the realisations are from the target, for instance after a renewal event. Tuning MCMC should remain a partial mystery to readers after reading Chapter 7 as the Goldilocks principle is quite vague. Similarly, the derivation of the hyperparameters in a novel setting (not covered by the book) should prove a challenge, even though the readers are encouraged to “go forth and do some Bayes things” (page 509).

While Bayes factors are supported for some hypothesis testing (with no point null), model comparison follows more exploratory methods like X validation and expected log-predictive comparison.

The examples and exercises are diverse (if mostly US centric), modern (including cultural references that completely escape me), and often reflect on the authors’ societal concerns. In particular, their concern about a fair use of the inferred models is preminent, even though a quantitative assessment of the degree of fairness would require a much more advanced perspective than the book allows… (In that respect, Exercise 18.2 and the following ones are about book banning (in the US). Given the progressive tone of the book, and the recent ban of math textbooks in the US, I wonder if some conservative boards would consider banning it!) Concerning the Himalaya submitting running example (Chapters 18 & 19), where the probability to summit is conditional on the age of the climber and the use of additional oxygen, I am somewhat surprised that the altitude of the targeted peak is not included as a covariate. For instance, Ama Dablam (6848 m) is compared with Annapurna I (8091 m), which has the highest fatality-to-summit ratio (38%) of all. This should matter more than age: the Aosta guide Abele Blanc climbed Annapurna without oxygen at age 57! More to the point, the (practical) detailed examples do not bring unexpected conclusions, as for instance the fact that runners [thrice alas!] tend to slow down with age.

A geographical comment: Uluru (page 267) is not a city!, but an impressive sandstone monolith in the heart of Australia, a 5 hours drive away from Alice Springs. And historical mentions: Alan Turing (page 10) and the team at Bletchley Park indeed used Bayes factors (and sequential analysis) in cracking the Enigma, but this remained classified information for quite a while. Arianna Rosenbluth (page 10, but missing on page 165) was indeed a major contributor to Metropolis et al.  (1953, not cited), but would not qualify as a Bayesian statistician as the goal of their algorithm was a characterisation of the Boltzman (or Gibbs) distribution, not statistical inference. And David Blackwell’s (page 10) Basic Statistics is possibly the earliest instance of an introductory Bayesian and decision-theory textbook, but it never mentions Bayes or Bayesianism.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## of first importance

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on June 14, 2022 by xi'an

My PhD student Charly Andral came with the question of the birthdate of importance sampling. I was under the impression that it had been created at the same time as the plain Monte Carlo method, being essentially the same thing since

$\int_{\mathfrak X} h(x)f(x)\,\text dx = \int_{\mathfrak X} h(x)\frac{f(x)}{g(x)}g(x)\,\text dx$

hence due to von Neumann or Ulam, but he could not find a reference earlier than a 1949 proceeding publication by Hermann Kahn in a seminar on scientific computation run by IBM. Despite writing a series of Monte Carlo papers in the late 1940’s and 1950’s, Kahn is not well-known in these circles (although mentioned in Fishman’s book), while being popular to some extent for his theorisation of nuclear war escalation and deterence. (I wonder if the concept is developed in some of his earlier 1948 papers. In a 1951 paper with Goertzel, a footnote signals than the approach was called quota sampling in their earlier papers. Charly has actually traced the earliest proposal as being Kahn’s, in a 14 June 1949 RAND preprint, beating Goertzel’s Oak Ridge National Laboratory preprint on quota sampling and importance functions by five days.)

(As a further marginalia, Kahn wrote with T.E. Harris an earlier preprint on Monte Carlo methods in April 1949, the same Harris as in Harris recurrence.)