## piling up ziggurats

Posted in Books, pictures, Statistics, Travel with tags , , , , , , on June 7, 2021 by xi'an This semester. a group of Dauphine graduate students worked under my direction on simulation problems and resorted to using the Ziggurat method developed by George Marsaglia and Wai Wan Tsang, at about the time Devroye was completing his simulation bible. The algorithm covers the half-Normal density by 2², 2⁴, 2⁸, &tc., stripes, all but one rectangles and all with the same surface v. Generating uniformly from the tail strip means generating either uniformly from the rectangle part, x<r, or exactly from the Normal tail x>r, using a drifted exponential accept-reject. The choice between both does not require the surface of the rectangle but a single simulation y=vU/f(r). Furthermore, for the other rectangles, checking first that the first coordinate of the simulated point is less than the left boundary of the rectangle above avoids computing the density. This method is incredibly powerful, once the boundaries have been determined. With 2³² stripes, its efficiency is 99.3% acceptance rate. Compared with a fast algorithm by Ahrens & Dieter (1989), it is three times faster…

## ratio of Gaussians

Posted in Books, Statistics, University life with tags , , , , , , , , on April 12, 2021 by xi'an

Following (as usual) an X validated question, I came across two papers of George Marsaglia on the ratio of two arbitrary (i.e. unnormalised and possibly correlated) Normal variates. One was a 1965 JASA paper, where the density of the ratio X/Y is exhibited, based on the fact that this random variable can always be represented as (a+ε)/(b+ξ) where ε,ξ are iid N(0,1) and a,b are constant. Surprisingly (?), this representation was challenged in a 1969 paper by David Hinkley (corrected in 1970). And less surprisingly the ratio distribution behaves almost like a Cauchy, since its density is meaning it is a two-component mixture of a Cauchy distribution, with weight exp(-a²/2-b²/2), and of an altogether more complex distribution ƒ². This is remarked by Marsaglia in the second 2006 paper, although the description of the second component remains vague, besides a possible bimodality. (It could have a mean, actually.) The density ƒ² however resembles (at least graphically) the generalised Normal inverse density I played with, eons ago.

## certified RNGs

Posted in Statistics with tags , , , , , , , on April 27, 2020 by xi'an A company called Gaming Laboratories International (GLI) is delivering certificates of randomness. Apparently using Marsaglia’s DieHard tests. Here are some unforgettable quotes from their webpage:

“…a Random Number Generator (RNG) is a key component that MUST be adequately and fully tested to ensure non-predictability and no biases exist towards certain game outcomes.”

“GLI has the most experienced and robust RNG testing methodologies in the world. This includes software-based (pseudo-algorithmic) RNG’s, Hardware RNG’s, and hybrid combinations of both.”

“GLI uses custom software written and validated through the collaborative effort of our in-house mathematicians and industry consultants since our inception in 1989. An RNG Test Suite is applied for randomness testing.”

“No lab in the world provides the level of iGaming RNG assurance that GLI does. Don’t take a chance with this most critical portion of your iGaming system.”

## ziggurat algorithm

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on October 30, 2018 by xi'an

A ziggurat (Akkadian: ziqqurat, D-stem of zaqāru “to build on a raised area”) is a type of massive stone structure built in ancient Mesopotamia. It has the form of a terraced compound of successively receding stories or levels. Wikipedia In a recent arXival, Jalalvand and Charsooghi revisit the ziggurat algorithm that simulates from a univariate distribution by finding horizontal strips that pile up on top of the target as in a ziggurat or a pyramid, hence the name. Which George Marsaglia introduced in 1963. When finely tuned the method is quite efficient. Maybe because it designs an accept-reject move for each strip of the ziggurat rather than globally. For instance, versions constructed for a Normal target are more efficient [3½ times faster] than the Box-Muller algorithm. The generalisation found in the paper divides the target into strips of equal area, rather than dominating rectangular strips of equal area, which requires some work when the target density is non-standard. For targets with unbounded support or unbounded values, a function g transforming the tail into (0,1) has to be selected. A further constraint is that the inverse cdf of the transformed g(X) has to be known. And a large part of the paper examines several scenarii towards simulating from the tail region. For unbounded densities, a similarly minute analysis is undertaken, again with requests about the target like its algebraic order.

“…the result of division of a random integer by its range is a fixed-point number which unlike a floating-point number does not enjoy increased precision near 0. When such random numbers are used in the tail algorithm they cause premature termination of the tail and large gaps between produced random numbers near the termination point.”

The paper further discusses the correction of an error common to earlier ziggurat algorithms, due to the  conversion from fixed-point to floating-point numbers, as indicated in the above quote. Although this had already been addressed by George Marsaglia in the early 1990’s.

“Ziggurat algorithm has a high setup time, so it’s not suitable for applications that require variates with frequently changing shape parameters.”

When testing the algorithm against different methods (in STL and Boost), and different distributions, the gains are between two and seven times faster, except for the Exponential target where the original ziggurat algorithm performs better. Interestingly, the gains (and the computing time) increase with the degrees of freedom for the Gamma target, in relation with Devroye’s (1986) remark on the absence of uniformly bounded execution times for this distribution. Same thing for the Weibull variates, obviously. Reflecting upon the usually costly computation of cdfs and inverse cdfs on machines and software, the inverse cdf method is systematically left behind! In conclusion, a good Sunday morning read if not of direct consequences for MCMC implementation, as warned by the authors.

## RNG impact on MCMC [or lack thereof]

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , on July 13, 2017 by xi'an Following the talk at MCM 2017 about the strange impact of the random generator on the outcome of an MCMC generator, I tried in Montréal airport the following code on the banana target of Haario et al. (1999), copied from Soetaert and Laine and using the MCMC function of the FME package:

```library(FME)
Banana <- function (x1, x2) {
return(x2 - (x1^2+1)) }
pmultinorm <- function(vec, mean, Cov) {
diff <- vec - mean
ex <- -0.5*t(diff) %*% solve(Cov) %*% diff
rdet <- sqrt(det(Cov))
power <- -length(diff)*0.5
return((2.*pi)^power / rdet * exp(ex)) }
BananaSS <- function (p) {
P <- c(p, Banana(p, p))
Cov <- matrix(nr = 2, data = c(1, 0.9, 0.9, 1))
N=1e3
ejd=matrix(0,4,N)
RNGkind("Mars")
for (t in 1:N){
MCMC <- modMCMC(f = BananaSS, p = c(0, 0.7),
jump = diag(nrow = 2, x = 5), niter = 1e3)
ejd[1,t]=mean((MCMC\$pars[-1,2]-MCMC\$pars[1,2])^2)}
```

since this divergence from the initial condition seemed to reflect the experiment of the speaker at MCM 2017. Unsurprisingly, no difference came from using the different RNGs in R (which may fail to contain those incriminated by the study)…