## inverse Gaussian trick [or treat?]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 29, 2020 by xi'an

When preparing my mid-term exam for my undergrad mathematical statistics course, I wanted to use the inverse Gaussian distribution IG(μ,λ) as an example of exponential family and include a random generator question. As shown above by a Fortran computer code from Michael, Schucany and Haas, a simple version can be based on simulating a χ²(1) variate and solving in x the following second degree polynomial equation

$\dfrac{\lambda(x-\mu)^2}{\mu^2 x} = v$

since the left-hand side transform is distributed as a χ²(1) random variable. The smallest root x¹, less than μ, is then chosen with probability μ/(μ+x¹) and the largest one, x²=μ²/x¹ with probability x¹/(μ+x¹). A relatively easy question then, except when one considers asking for the proof of the χ²(1) result, which proved itself to be a harder cookie than expected! The paper usually referred to for the result, Schuster (1968), is quite cryptic on the matter, essentially stating that the above can be expressed as the (bijective) transform of Y=min(X,μ²/X) and that V~χ²(1) follows immediately. I eventually worked out a proof by the “law of the unconscious statistician” [a name I do not find particularly amusing!], but did not include the question in the exam. But I found it fairly interesting that the inverse Gaussian can be generating by “inverting” the above equation, i.e. going from a (squared) Gaussian variate V to the inverse Gaussian variate X. (Even though the name stems from the two cumulant generating functions being inverses of one another.)

## random generators produce ties

Posted in Books, R, Statistics with tags , , , , , , , on April 21, 2020 by xi'an

“…an essential part of understanding how many ties these RNGs produce is to understand how many ties one expects in 32-bit integer arithmetic.”

A sort of a birthday-problem paper for random generators by Markus Hofert on arXiv as to why they produce ties. As shown for instance in the R code (inspired by the paper):

sum(duplicated(runif(1e6)))


returning values around 100, which is indeed unexpected until one thinks a wee bit about it… With no change if moving to an alternative to the Mersenne twister generator. Indeed, assuming the R random generators produce integers with 2³² values, the expected number of ties is actually 116 for 10⁶ simulations. Moving to 2⁶⁴, the probability of a tie is negligible, around 10⁻⁸. A side remark of further inerest in the paper is that, due to a different effective gap between 0 and the smallest positive normal number, of order 10⁻²⁵⁴ and between 1 and the smallest normal number greater than 1, of order 10⁻¹⁶, “the grid of representable double numbers is not equidistant”. Justifying the need for special functions such as expm1 and log1p, corresponding to more accurate derivations of exp(x)-1 and log(1+x).

## biased sample!

Posted in Statistics with tags , , , , , , , , , , , on May 21, 2019 by xi'an

A chance occurrence led me to this thread on R-devel about R sample function generating a bias by taking the integer part of the continuous uniform generator… And then to the note by Kellie Ottoboni and Philip Stark analysing the reason, namely the fact that R uniform [0,1) pseudo-random generator is not perfectly continuously uniform but discrete, by the nature of numbers on a computer. Knuth (1997) showed that in this case the range of probabilities is larger than (1,1), the largest range being (1,1.03). As noted in the note, exploiting directly the pseudo-random bits of the pseudo-random generator. Shocking, isn’t it!  A fast and bias-free alternative suggested by Lemire is available as dqsample::sample

As an update of June 2019, sample is now fixed.

## independent random sampling methods [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 16, 2018 by xi'an

Last week, I had the pleasant surprise to receive a copy of this book in the mail. Book that I was not aware had been written or published (meaning that I was not involved in its review!). The three authors, Luca Martino, David Luengo, and Joaquín Míguez, of Independent Random Sampling Methods are from Madrid universities and I have read (and posted on) several of their papers on (population) Monte Carlo simulation in the recent years. Including Luca’s survey of multiple try MCMC which was helpful in writing our WIREs own survey.

The book is a pedagogical coverage of most algorithms used to simulate independent samples from a given distribution, which of course recoups some of the techniques exposed with more details by [another] Luc, namely Luc Devroye’s Non-uniform random variate generation bible, often mentioned here (and studied in uttermost details by a dedicated reading group in Warwick). It includes a whole chapter on accept-reject methods, with in particular a section on Payne-Dagpunar’s band rejection I had not seen previously. And another entire chapter on ratio-of-uniforms techniques. On which the three authors had proposed generalisations [covered by the book], years before I attempted to go the same way, having completely forgotten reading their paper at the time… Or the much earlier 1991 paper by Jon Wakefield, Alan Gelfand and Adrian Smith!

The book also covers the “vertical density representation”, due to Troutt (1991), which consists in considering the distribution of the density p(.) of the random variable X as a random variable, p(X). I remember pondering about this alternative to the cdf transform and giving up on it as the outcome has a distribution depending on p, even when the density is monotonous. Even though I am not certain from reading the section that this is particularly appealing…

Given its title, the book contains very little about MCMC. Except for a last and final chapter that covers adaptive independent Metropolis-Hastings algorithms, in connection with some of the authors’ recent work. Like multiple try Metropolis. Relating to the (unidimensional) ARMS “ancestor” of adaptive MCMC methods. (As noted in a recent blog on Holden et al., 2009 , I have trouble understanding how recycling only rejected proposed values to build a better proposal distribution is enough to guarantee convergence of an adaptive algorithm, but the book does not delve much into this convergence.)

All in all and with the bias induced by me working in the very area, I find the book quite a nice entry on the topic, which can be used in a Monte Carlo course at both undergraduate and graduate levels if one want to avoid going into Markov chains. It is certainly less likely to scare students away than the comprehensive Non-uniform random variate generation and on the opposite may induce some of them to pursue a research career in this domain.

## random generators… unfit for ESP testing?!

Posted in Books, Statistics with tags , , , , , , , on September 10, 2014 by xi'an

“The term psi denotes anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms.”

When re-reading [in the taxi to Birmingham airport] Bem’s piece on “significant” ESP tests, I came upon the following hilarious part that I could not let pass:

“For most psychological experiments, a random number table or the random function built into most programming languages provides an adequate tool for randomly assigning participants to conditions or sequencing stimulus presentations. For both methodological and conceptual reasons, however, psi researchers have paid much closer attention to issues of randomization.

At the methodological level, the problem is that the random functions included in most computer languages are not very good in that they fail one or more of the mathematical tests used to assess the randomness of a sequence of numbers (L’Ecuyer, 2001), such as Marsaglia’s rigorous Diehard Battery of Tests of Randomness (1995). Such random functions are sometimes called pseudo random number generators (PRNGs) because they [are] not random in the sense of being indeterminate because once the initial starting number (the seed) is set, all future numbers in the sequence are fully determined.”

Well, pseudo-random generators included in all modern computer languages that I know have passed tests like diehard. It would be immensely useful to learn of counterexamples as those using the corresponding language should be warned!!!

“In contrast, a hardware-based or “true” RNG is based on a physical process, such as radioactive decay or diode noise, and the sequence of numbers is indeterminate in the quantum mechanical sense. This does not in itself guarantee that the resulting sequence of numbers can pass all the mathematical tests of randomness (…) Both Marsaglia’s own PRNG algorithm and the “true” hardware-based Araneus Alea I RNG used in our experiments pass all his diehard tests (…) At the conceptual level, the choice of a PRNG or a hardware-based RNG bears on the interpretation of positive findings. In the present context, it bears on my claim that the experiments reported in this article provide evidence for precognition or retroactive influence.”

There is no [probabilistic] validity in the claim that hardware random generators are more random than pseudo-random ones. Hardware generators may be unpredictable even by the hardware conceptor, but the only way to check they produce generations from a uniform distribution follows exactly the same pattern as for PRNG. And the lack of reproducibility of the outcome makes it impossible to check the reproducibility of the study. But here comes the best part of the story!

“If an algorithm-based PRNG is used to determine the successive left-right positions of the target pictures, then the computer already “knows” the upcoming random number before the participant makes his or her response; in fact, once the initial seed number is generated, the computer implicitly knows the entire sequence of left/right positions. As a result, this information is potentially available to the participant through real-time clairvoyance, permitting us to reject the more extraordinary claim that the direction of the causal arrow has actually been reversed.”

Extraordinary indeed… But not more extraordinary than conceiving that a [psychic] participant in the experiment may “see” the whole sequence of random numbers!

“In contrast, if a true hardware-based RNG is used to determine the left/right positions, the next number in the sequence is indeterminate until it is actually generated by the quantum physical process embedded in the RNG, thereby ruling out the clairvoyance alternative. This argues for using a true RNG to demonstrate precognition or retroactive influence. But alas, the use of a true RNG opens the door to the psychokinesis interpretation: The participant might be influencing the placement of the upcoming target rather than perceiving it, a possibility supported by a body of empirical evidence testing psychokinesis with true RNGs (Radin, 2006, pp.154–160).”

Good! I was just about to make the very same objection! If someone can predict the whole sequence of [extremely long integer] values of a PRNG, it gets hardly more irrational to imagine that he or she can mentally impact a quantum mechanics event. (And hopefully save Schröninger’s cat in the process.) Obviously, it begets the question as to how a subject could forecast a location of the picture that depends on the random generation but not forecast the result of the random generation.

“Like the clairvoyance interpretation, the psychokinesis interpretation also permits us to reject the claim that the direction of the causal arrow has been reversed. Ironically, the psychokinesis alternative can be ruled out by using a PRNG, which is immune to psychokinesis because the sequence of numbers is fully determined and can even be checked after the fact to confirm that its algorithm has not been perturbed. Over the course of our research program—and within the experiment just reported—we have obtained positive results using both PRNGs and a true RNG, arguably leaving precognition/reversed causality the only nonartifactual interpretation that can account for all the positive results.”

This is getting rather confusing. Avoid using a PRNG for fear the subject infers about the sequence and avoid using a RNG for fear of the subject tempering with the physical generator. An omniscient psychic would be able to hand both types of generators, wouldn’t he or she!?!

“This still leaves open the artifactual alternative that the output from the RNG is producing inadequately randomized sequences containing patterns that fortuitously match participants’ response biases.”

This objection shows how little confidence the author has in the randomness tests he previously mentioned: a proper random generator is not inadequately randomized. And if chance only rather than psychic powers is involved, there is no explanation for the match with the participants’ response. Unless those participants are so clever as to detect the flaws in the generator…

“In the present experiment, this possibility is ruled out by the twin findings that erotic targets were detected significantly more frequently than randomly interspersed nonerotic targets and that the nonerotic targets themselves were not detected significantly more frequently than chance. Nevertheless, for some of the other experiments reported in this article, it would be useful to have more general assurance that there are not patterns in the left/right placements of the targets that might correlate with response biases of participants. For this purpose, Lise Wallach, Professor of Psychology at Duke University, suggested that I run a virtual control experiment using random inputs in place of human participants.”

Absolutely brilliant! This test replacing the participants with random generators has shown that the subjects’ answers do not correspond to an iid sequence from a uniform distribution. It would indeed require great psychic powers to reproduce a perfectly iid U(0,1) sequence! And the participants were warned about the experiment so naturally expected to see patterns in the sequence of placements.