Archive for random number generation

my own personal hope for the future is that we won’t have to build any more random number generators…

Posted in Books, Statistics, University life with tags , , , , , , on April 19, 2020 by xi'an

Came perchance upon this reminiscence about the generation of the 10⁶ random digits found in the book published by the RAND Corporation. It took them a month to produce half a million digits, exploiting a “random frequency pulse source gated by a constant frequency pulse” behaving like a “roulette wheel with 32 positions, making on the average 3000 revolutions on each turn”. As the outcome failed on the odd/even ratio test, the RAND engineers randomized further the outcome by adding “(mod 10) the digits in each card, digit by digits, to the corresponding digits of the previous card”. (Cards as in punched cards, the outcome being printed 50 digits at a time on I.B.M. cards.) A last piece of Monte Carlo trivia is that the electronic roulette at the basis of this random generator was devised by Hastings, Cecil not Wilfred Keith. (And RAND is an abbreviation of Research and Development, not of randomness!)

quantum simulation or supremacy?

Posted in Books, Statistics, University life with tags , , , , , on November 11, 2019 by xi'an

d41586-019-03167-2_17301142Nature this week contains a prominent paper by Arute et al. reporting an experiment on a quantum computer running a simulation on a state-space of dimension 253 (which is the number of qubits in their machine, plus one dedicated to error correction if I get it right). With a million simulation of the computer state requiring 200 seconds. Which they claim would take 10,000 years (3 10¹¹ seconds) to run on a classical super-computer. And which could be used towards producing certified random numbers, an impressive claim given the intrinsic issue of qubit errors. (This part is not developed in the paper but I wonder how a random generator could handle such errors.)

“…a “challenger” generates a random quantum circuit C (i.e., a random sequence of 1-qubit and nearest-neighbor 2-qubit gates, of depth perhaps 20, acting on a 2D grid of n = 50 to 60 qubits). The challenger then sends C to the quantum computer, and asks it apply C to the all-0 initial state, measure the result in the {0,1} basis, send back whatever n-bit string was observed, and repeat some thousands or millions of times. Finally, using its knowledge of C, the classical challenger applies a statistical test to check whether the outputs are consistent with the QC having done this.” The blog of Scott Aaronson

I have tried reading the Nature paper but had trouble grasping the formidable nature of the simulation they were discussing, as it seems to be the reproduction by a simulation of a large quantum circuit of depth 20, as helpfully explained in the above quote. And checking the (non-uniform) distribution of the random simulation is the one expected. Which is the hard part and requires a classical (super-)computer to determine the theoretical distribution. And the News & Views entry in the same issue of Nature. According to Wikipedia, “the best known algorithm for simulating an arbitrary random quantum circuit requires an amount of time that scales exponentially with the number of qubits“. However, IBM (a competitor of Google in the quantum computer race) counter-claims that the simulation of the circuit takes only 2.5 days on a classical computer with optimised coding. (And this should be old news by the time this blog post comes out, since even a US candidate for the presidency has warned about it!)

ziggurat algorithm

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on October 30, 2018 by xi'an

A ziggurat (Akkadian: ziqqurat, D-stem of zaqāru “to build on a raised area”) is a type of massive stone structure built in ancient Mesopotamia. It has the form of a terraced compound of successively receding stories or levels. Wikipedia

In a recent arXival, Jalalvand and Charsooghi revisit the ziggurat algorithm that simulates from a univariate distribution by finding horizontal strips that pile up on top of the target as in a ziggurat or a pyramid, hence the name. Which George Marsaglia introduced in 1963. When finely tuned the method is quite efficient. Maybe because it designs an accept-reject move for each strip of the ziggurat rather than globally. For instance, versions constructed for a Normal target are more efficient [3½ times faster] than the Box-Muller algorithm. The generalisation found in the paper divides the target into strips of equal area, rather than dominating rectangular strips of equal area, which requires some work when the target density is non-standard. For targets with unbounded support or unbounded values, a function g transforming the tail into (0,1) has to be selected. A further constraint is that the inverse cdf of the transformed g(X) has to be known. And a large part of the paper examines several scenarii towards simulating from the tail region. For unbounded densities, a similarly minute analysis is undertaken, again with requests about the target like its algebraic order.

“…the result of division of a random integer by its range is a fixed-point number which unlike a floating-point number does not enjoy increased precision near 0. When such random numbers are used in the tail algorithm they cause premature termination of the tail and large gaps between produced random numbers near the termination point.”

The paper further discusses the correction of an error common to earlier ziggurat algorithms, due to the  conversion from fixed-point to floating-point numbers, as indicated in the above quote. Although this had already been addressed by George Marsaglia in the early 1990’s.

“Ziggurat algorithm has a high setup time, so it’s not suitable for applications that require variates with frequently changing shape parameters.”

When testing the algorithm against different methods (in STL and Boost), and different distributions, the gains are between two and seven times faster, except for the Exponential target where the original ziggurat algorithm performs better. Interestingly, the gains (and the computing time) increase with the degrees of freedom for the Gamma target, in relation with Devroye’s (1986) remark on the absence of uniformly bounded execution times for this distribution. Same thing for the Weibull variates, obviously. Reflecting upon the usually costly computation of cdfs and inverse cdfs on machines and software, the inverse cdf method is systematically left behind! In conclusion, a good Sunday morning read if not of direct consequences for MCMC implementation, as warned by the authors.


independent random sampling methods [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 16, 2018 by xi'an

Last week, I had the pleasant surprise to receive a copy of this book in the mail. Book that I was not aware had been written or published (meaning that I was not involved in its review!). The three authors, Luca Martino, David Luengo, and Joaquín Míguez, of Independent Random Sampling Methods are from Madrid universities and I have read (and posted on) several of their papers on (population) Monte Carlo simulation in the recent years. Including Luca’s survey of multiple try MCMC which was helpful in writing our WIREs own survey.

The book is a pedagogical coverage of most algorithms used to simulate independent samples from a given distribution, which of course recoups some of the techniques exposed with more details by [another] Luc, namely Luc Devroye’s Non-uniform random variate generation bible, often mentioned here (and studied in uttermost details by a dedicated reading group in Warwick). It includes a whole chapter on accept-reject methods, with in particular a section on Payne-Dagpunar’s band rejection I had not seen previously. And another entire chapter on ratio-of-uniforms techniques. On which the three authors had proposed generalisations [covered by the book], years before I attempted to go the same way, having completely forgotten reading their paper at the time… Or the much earlier 1991 paper by Jon Wakefield, Alan Gelfand and Adrian Smith!

The book also covers the “vertical density representation”, due to Troutt (1991), which consists in considering the distribution of the density p(.) of the random variable X as a random variable, p(X). I remember pondering about this alternative to the cdf transform and giving up on it as the outcome has a distribution depending on p, even when the density is monotonous. Even though I am not certain from reading the section that this is particularly appealing…

Given its title, the book contains very little about MCMC. Except for a last and final chapter that covers adaptive independent Metropolis-Hastings algorithms, in connection with some of the authors’ recent work. Like multiple try Metropolis. Relating to the (unidimensional) ARMS “ancestor” of adaptive MCMC methods. (As noted in a recent blog on Holden et al., 2009 , I have trouble understanding how recycling only rejected proposed values to build a better proposal distribution is enough to guarantee convergence of an adaptive algorithm, but the book does not delve much into this convergence.)

All in all and with the bias induced by me working in the very area, I find the book quite a nice entry on the topic, which can be used in a Monte Carlo course at both undergraduate and graduate levels if one want to avoid going into Markov chains. It is certainly less likely to scare students away than the comprehensive Non-uniform random variate generation and on the opposite may induce some of them to pursue a research career in this domain.

certified randomness, 187m away…

Posted in Statistics with tags , , , , , , , on May 3, 2018 by xi'an

As it rarely happens with Nature, I just read an article that directly relates to my research interests, about a secure physical random number generator (RNG). By Peter Bierhost and co-authors, mostly physicists apparently. Security here means that the outcome of the RNG is unpredictable. This very peculiar RNG is based on two correlated photons sent to two measuring stations, separated by at least 187m, which have to display unpredictable outcomes in order to respect the impossibility of faster-than-light communications, otherwise known as Bell inequalities. This is hardly practical though, especially when mentioning that the authors managed to produce 2¹⁰ random bits over 10 minutes, post processing “the measurement of 55 million photon pairs”. (I however fail to see why the two-arm apparatus would be needed for regular random generation as it seems relevant solely for the demonstration of randomness.) I also checked the associated supplementary material, which is mostly about proving some total variation bound, and constructing a Bell function. What is most puzzling in this paper (and the associated supplementary material) is the (apparent) lack of guarantee of uniformity of the RNG. For instance, a sentence (Supplementary Material, p.11) about  a distribution being “within TV distance of uniform” hints at the method being not provably uniform, which makes the whole exercise incomprehensible…