Archive for random number generator

21w5107 [day 1]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on November 30, 2021 by xi'an

The workshop started by the bad news of our friend Michele Guindani being hit and mugged upon arrival in Oaxaca, Saturday night. Fortunately, he was not hurt, but lost both phone and wallet, always a major bummer when abroad… Still this did not cast a lasting pall on the gathering of long-time no-see friends, whom I had indeed not seen for at least two years. Except for those who came to the CIRMirror!

A few hours later, we got woken up by fairly loud firecrackers (palomas? cohetes?) at 5am, for no reason I can fathom (the Mexican Revolution day was a week ago) although it seemed correlated with the nearby church bells going on at full blast (for Lauds? Hanukkah? Cyber Monday? Chirac’s birthdate?). The above picture was taken the Santa María del Tule town with its super-massive Montezuma cypress tree, with remaining decorations from the Día de los Muertos.

Without launching (much) the debate on whether or not Bayesian non-parametrics qualified as “objective Bayesian” methods, Igor Prünster started the day with a non-parametric presentation of dependent random probability measures. With the always fascinating notion that a random discrete non-parametric prior is inducing a distribution on the partitions (EPPF). And applicability in mixtures and their generalisations. Realising that the highly discrete nature of such measures is not such an issue for a given sample size n, since there are at most n elements in the partition. Beatrice Franzolini discussed of specific ways to create dependent distributions based on independent samples, although her practical example based on one N(-10,1) sample and another (independently) N(10,1) sample seemed to fit in several of the dependent random measures she compared. And Marta Catalano (Warwick) presented her work on partial exchangeability and optimal transportation (which I had also heard in CIRM last June and in Warwick last week). One thing I had not realised earlier was the dependence of the Wasserstein distance on the parameterisation, although it now makes perfect sense. If only for the coupling.  I had alas to miss Isadora Antoniano-Villalobos’ talk as I had to teach my undergrad class in Paris Dauphine at the same time… This non-parametric session was quite homogeneous and rich in perspectives.

In an all-MCMC afternoon, Julyan Arbel talked about reference priors for extreme value distributions, with the “shocking” case of a restriction on the support of one parameter, ξ. Which means in fact that the Jeffreys prior is then undefined. This reminded me somewhat of the work of Clara Grazian on Jeffreys priors for mixtures, where some models were not allowing for Fisher information to exist. The second part of this talk was about modified local versions of Gelman & Rubin (1992) R hats. And the recent modification proposed by Aki and co-authors. Where I thought that a simplification of the multivariate challenge of defining ranks could be alleviated by considering directly the likelihood values of the chains. And Trevor Campbell gradually built an involved parallel tempering method where the powers of a geometric mixture are optimised as spline functions of the temperature. Next, María Gil-Leyva presented her original and ordered approach to mixture estimation, which I discussed in a blog published two days ago (!). She corrected my impressions that (i) the methods were all impervious to label switching and (ii) required some conjugacy to operate. The final talk of the day was by Anirban Bhattacharya on high-D Bayesian regression and coupling techniques for checking convergence, a paper that had been on my reading list for a long while. A very elaborate construct of coupling strategies within a Gibbs sampler, with some steps relying on optimal coupling and others on the use of common random generators.

certified RNGs

Posted in Statistics with tags , , , , , , , on April 27, 2020 by xi'an

A company called Gaming Laboratories International (GLI) is delivering certificates of randomness. Apparently using Marsaglia’s DieHard tests. Here are some unforgettable quotes from their webpage:

“…a Random Number Generator (RNG) is a key component that MUST be adequately and fully tested to ensure non-predictability and no biases exist towards certain game outcomes.”

“GLI has the most experienced and robust RNG testing methodologies in the world. This includes software-based (pseudo-algorithmic) RNG’s, Hardware RNG’s, and hybrid combinations of both.”

“GLI uses custom software written and validated through the collaborative effort of our in-house mathematicians and industry consultants since our inception in 1989. An RNG Test Suite is applied for randomness testing.”

“No lab in the world provides the level of iGaming RNG assurance that GLI does. Don’t take a chance with this most critical portion of your iGaming system.”
 

random generators produce ties

Posted in Books, R, Statistics with tags , , , , , , , on April 21, 2020 by xi'an

“…an essential part of understanding how many ties these RNGs produce is to understand how many ties one expects in 32-bit integer arithmetic.”

A sort of a birthday-problem paper for random generators by Markus Hofert on arXiv as to why they produce ties. As shown for instance in the R code (inspired by the paper):

sum(duplicated(runif(1e6)))

returning values around 100, which is indeed unexpected until one thinks a wee bit about it… With no change if moving to an alternative to the Mersenne twister generator. Indeed, assuming the R random generators produce integers with 2³² values, the expected number of ties is actually 116 for 10⁶ simulations. Moving to 2⁶⁴, the probability of a tie is negligible, around 10⁻⁸. A side remark of further inerest in the paper is that, due to a different effective gap between 0 and the smallest positive normal number, of order 10⁻²⁵⁴ and between 1 and the smallest normal number greater than 1, of order 10⁻¹⁶, “the grid of representable double numbers is not equidistant”. Justifying the need for special functions such as expm1 and log1p, corresponding to more accurate derivations of exp(x)-1 and log(1+x).

a perfectly normally distributed sample

Posted in R, Statistics with tags , , , , , , , , on May 9, 2019 by xi'an

When I saw this title on R-bloggers, I was wondering how “more perfect” a Normal sample could be when compared with the outcome of rnorm(n). Hence went checking the original blog on bayestestR in search of more information. Which was stating nothing more than how to generate a sample is perfectly normal by using the rnorm_perfect function. Still unsure of the meaning, I contacted one of the contributors who replied very quickly

…that’s actually a good question. I would say an empirical sample having characteristics as close as possible to a cannonic gaussian distribution.
and again leaving me hungering for more details. I thus downloaded the package bayestestR and opened the rnorm_perfect function. Which is simply the sequence of n-quantiles
stats::qnorm(seq(1/n, 1 – 1/n, length.out = n), mean, sd)
which I would definitely not call a sample as it has nothing random. And perfect?! Not really, unless one associates randomness and imperfection.

George Forsythe’s last paper

Posted in Books, Statistics, University life with tags , , , on May 25, 2018 by xi'an

When looking for a link in a recent post, I came across Richard Brent’ arXival of historical comments on George Forsythe’s last paper (in 1972). Which is about the Forsythe-von Neumann approach to simulating exponential variates, covered in Luc Devroye’s Non-Uniform Random Variate Generation in a special section, Section 2 of Chapter 4,  is about generating a random variable from a target density proportional to g(x)exp(-F(x)), where g is a density and F is a function on (0,1). Then, after generating a realisation x⁰ from g and computing F(x⁰), generate a sequence u¹,u²,… of uniforms as long as they keep decreasing, i.e., F(x⁰) >u¹>u²>… If the maximal length k of this sequence is odd, the algorithm exists with a value x⁰ generated from  g(x)exp(-F(x)). Von Neumann (1949) treated the special case when g is constant and F(x)=x, which leads to an Exponential generator that never calls an exponential function. Which does not make the proposal a particularly efficient one as it rejects O(½) of the simulations. Refinements of the algorithm lead to using on average 1.38 uniforms per Normal generation, which does not sound much faster than a call to the Box-Muller method, despite what is written in the paper. (Brent also suggests using David Wallace’s 1999 Normal generator, which I had not encountered before. And which I am uncertain is relevant at the present time.)