And another one borrowed from the highly popular post on the random variable [almost] always smaller than its mean!
Archive for pseudo-random generator
As an organiser of the ABC session (along with Paul Fearnhead), I was already aware of most results behind the talks, but nonetheless got some new perspectives from the presentations. Ewan Cameron discussed a two-stage ABC where the first step is actually an indirect inference inference, which leads to a more efficient ABC step. With applications to epidemiology. Lu presented extensions of his work with Paul Fearnhead, incorporating regression correction à la Beaumont to demonstrate consistency and using defensive sampling to control importance sampling variance. (While we are working on a similar approach, I do not want to comment on the consistency part, but I missed how defensive sampling can operate in complex ABC settings, as it requires advanced knowledge on the target to be effective.) And Ted Meeds spoke about two directions for automatising ABC (as in the ABcruise), from incorporating the pseudo-random generator into the representation of the ABC target, to calling for deep learning advances. The inclusion of random generators in the transform is great, provided they can remain black boxes as otherwise they require recoding. (This differs from quasi-Monte Carlo ABC, which aims at reducing the variability due to sheer noise.) It took me a little while, but I eventually understood why Jan Haning saw this inclusion as a return to fiducial inference!
Merlise Clyde gave a wide-ranging plenary talk on (linear) model selection that looked at a large range of priors under the hat of generalised confluent hypergeometric priors over the mixing scale in Zellner’s g-prior. Some were consistent under one or both models, maybe even for misspecified models. Some parts paralleled my own talk on the foundations of Bayesian tests, no wonder since I mostly give a review before launching into a criticism of the Bayes factor. Since I think this may be a more productive perspective than trying to over-come the shortcomings of Bayes factors in weakly informative settings. Some comments at the end of Merlise’s talk were loosely connected to this view in that they called for an unitarian perspective [rather than adapting a prior to a specific inference problem] with decision-theoretic backup. Conveniently the next session was about priors and testing, obviously connected!, with Leo Knorr-Held considering g-priors for the Cox model, Kerrie Mengersen discussing priors for over-fitted mixtures and HMMs, and Dan Simpson entertaining us with his quest of a prior for a point process, eventually reaching PC priors.
While browsing Images des Mathématiques, I came across this article [in French] that studies the impact of round-off errors on number representations in a dynamical system and checked how much this was the case for MCMC algorithms like the slice sampler (recycling some R code from Monte Carlo Statistical Methods). By simply adding a few signif(…,dig=n) in the original R code. And letting the precision n vary.
“…si on simule des trajectoires pendant des intervalles de temps très longs, trop longs par rapport à la précision numérique choisie, alors bien souvent, les résultats des simulations seront complètement différents de ce qui se passe en réalité…”
Rather unsurprisingly (!), using a small enough precision (like two digits on the first row) has a visible impact on the simulation of a truncated normal. Moving to three digits seems to be sufficient in this example… One thing this tiny experiment reminds me of is the lumpability property of Kemeny and Snell. A restriction on Markov chains for aggregated (or discretised) versions to be ergodic or even Markov. Also, in 2000, Laird Breyer, Gareth Roberts and Jeff Rosenthal wrote a Statistics and Probability Letters paper on the impact of round-off errors on geometric ergodicity. However, I presume [maybe foolishly!] that the result stated in the original paper, namely that there exists an infinite number of precision digits for which the dynamical system degenerates into a small region of the space does not hold for MCMC. Maybe foolishly so because the above statement means that running a dynamical system for “too” long given the chosen precision kills the intended stationary properties of the system. Which I interpret as getting non-ergodic behaviour when exceeding the period of the uniform generator. More or less.
Ted Meeds and Max Welling have not so recently written about an embarrassingly parallel approach to ABC that they call optimisation Monte Carlo. [Danke Ingmar for pointing out the reference to me.] They start from a rather innocuous rephrasing of the ABC posterior, writing the pseudo-observations as deterministic transforms of the parameter and of a vector of uniforms. Innocuous provided this does not involve an infinite number of uniforms, obviously. Then they suddenly switch to the perspective that, for a given uniform vector u, one should seek the parameter value θ that agrees with the observation y. A sort of Monte Carlo inverse regression: if
then invert this equation in θ. This is quite clever! Maybe closer to fiducial than true Bayesian statistics, since the prior does not occur directly [only as a weight p(θ)], but if this is manageable [and it all depends on the way f(θ,u) is constructed], this should perform better than ABC! After thinking about it a wee bit more in London, though, I realised this was close to impossible in the realistic examples I could think of. But I still like the idea and want to see if anything at all can be made of this…
“However, it is hard to detect if our optimization succeeded and we may therefore sometimes reject samples that should not have been rejected. Thus, one should be careful not to create a bias against samples u for which the optimization is difficult. This situation is similar to a sampler that will not mix to remote local optima in the posterior distribution.”
Now, the paper does not go that way but keeps the ε-ball approach as in regular ABC, to derive an approximation of the posterior density. For a while I was missing the difference between the centre of the ball and the inverse of the above equation, bottom of page 3. But then I realised the former was an approximation to the latter. When the authors discuss their approximation in terms of the error ε, I remain unconvinced by the transfer of the tolerance to the optimisation error, as those are completely different notions. This also applies to the use of a Jacobian in the weight, which seems out of place since this Jacobian appears in a term associated with (or replacing) the likelihood, f(θ,u), which is then multiplied by the prior p(θ). (Assuming a Jacobian exists, which is unclear when considering most simulation patterns use hard bounds and indicators.) When looking at the toy examples, it however makes sense to have a Jacobian since the selected θ’s are transforms of the u’s. And the p(θ)’s are simply importance weights correcting for the wrong target. Overall, the appeal of the method proposed in the paper remains unclear to me. Most likely because I did not spend enough time over it.
When experimenting with various quantiles functions in R, I was shocked [ok this is a bit excessive, let us say surprised] by how widely the execution times would vary. To the point of blaming a completely different feature of R. Borrowing from Charlie Geyer’s webpage on the topic of probability distributions in R, here is a table for some standard distributions: I ran
choosing an arbitrary parameter whenever needed.
Of course, it does not mean much in that all the slow distributions (except for Weibull) are parameterised. Nonetheless, that a chi-square inversion take 50 times longer than a uniform inversion remains puzzling as to why it is not coded more efficiently. In particular, I was wondering why the chi-square inversion was slower than the Gamma inversion. Rerunning both inversions showed that they are equivalent:
> u=runif(1e7) > system.time(x<-qgamma(u,sha=1.5)) utilisateur système écoulé 21.534 0.016 21.532 > system.time(x<-qchisq(u,df=3)) utilisateur système écoulé 21.372 0.008 21.361
Which also shows how variable system.time can be.
On Monday, Ed Meeds, Robert Leenders, and Max Welling (from Amsterdam) arXived a paper entitled Hamiltonian ABC. Before looking at the paper in any detail, I got puzzled by this association of antagonistic terms, since ABC is intended for complex and mostly intractable likelihoods, while Hamiltonian Monte Carlo requires a lot from the target, in order to compute gradients and Hessians… [Warning: some graphs on pages 13-14 may be harmful to your printer!]
Somewhat obviously (ex-post!), the paper suggests to use Hamiltonian dynamics on ABC approximations of the likelihood. They compare a Gaussian kernel version
with the synthetic Gaussian likelihood version of Wood (2010)
where both mean and variance are estimated from the simulated data. If ε is taken as an external quantity and driven to zero, the second approach is much more stable. But… ε is never driven to zero in ABC, or fixed at ε=0.37: It is instead considered as a kernel bandwidth and hence estimated from the simulated data. Hence ε is commensurable with σ(θ). And this makes me wonder at the relevance of the conclusion that synthetic is better than kernel for Hamiltonian ABC. More globally, I wonder at the relevance of better simulating from a still approximate target when the true goal is to better approximate the genuine posterior.
Some of the paper covers separate issues like handling gradient by finite differences à la Spall [if you can afford it!] and incorporating the random generator as part of the Markov chain. And using S common random numbers in computing the gradients for all values of θ. (Although I am not certain all random generators can be represented as a deterministic transform of a parameter θ and of a fixed number of random uniforms. But the authors may consider a random number of random uniforms when they represent their random generators as deterministic transform of a parameter θ and of the random seed. I am also uncertain about the distinction between common, sticky, and persistent random numbers!)
“The term psi denotes anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms.”
When re-reading [in the taxi to Birmingham airport] Bem’s piece on “significant” ESP tests, I came upon the following hilarious part that I could not let pass:
“For most psychological experiments, a random number table or the random function built into most programming languages provides an adequate tool for randomly assigning participants to conditions or sequencing stimulus presentations. For both methodological and conceptual reasons, however, psi researchers have paid much closer attention to issues of randomization.
At the methodological level, the problem is that the random functions included in most computer languages are not very good in that they fail one or more of the mathematical tests used to assess the randomness of a sequence of numbers (L’Ecuyer, 2001), such as Marsaglia’s rigorous Diehard Battery of Tests of Randomness (1995). Such random functions are sometimes called pseudo random number generators (PRNGs) because they [are] not random in the sense of being indeterminate because once the initial starting number (the seed) is set, all future numbers in the sequence are fully determined.”
Well, pseudo-random generators included in all modern computer languages that I know have passed tests like diehard. It would be immensely useful to learn of counterexamples as those using the corresponding language should be warned!!!
“In contrast, a hardware-based or “true” RNG is based on a physical process, such as radioactive decay or diode noise, and the sequence of numbers is indeterminate in the quantum mechanical sense. This does not in itself guarantee that the resulting sequence of numbers can pass all the mathematical tests of randomness (…) Both Marsaglia’s own PRNG algorithm and the “true” hardware-based Araneus Alea I RNG used in our experiments pass all his diehard tests (…) At the conceptual level, the choice of a PRNG or a hardware-based RNG bears on the interpretation of positive findings. In the present context, it bears on my claim that the experiments reported in this article provide evidence for precognition or retroactive influence.”
There is no [probabilistic] validity in the claim that hardware random generators are more random than pseudo-random ones. Hardware generators may be unpredictable even by the hardware conceptor, but the only way to check they produce generations from a uniform distribution follows exactly the same pattern as for PRNG. And the lack of reproducibility of the outcome makes it impossible to check the reproducibility of the study. But here comes the best part of the story!
“If an algorithm-based PRNG is used to determine the successive left-right positions of the target pictures, then the computer already “knows” the upcoming random number before the participant makes his or her response; in fact, once the initial seed number is generated, the computer implicitly knows the entire sequence of left/right positions. As a result, this information is potentially available to the participant through real-time clairvoyance, permitting us to reject the more extraordinary claim that the direction of the causal arrow has actually been reversed.”
Extraordinary indeed… But not more extraordinary than conceiving that a [psychic] participant in the experiment may “see” the whole sequence of random numbers!
“In contrast, if a true hardware-based RNG is used to determine the left/right positions, the next number in the sequence is indeterminate until it is actually generated by the quantum physical process embedded in the RNG, thereby ruling out the clairvoyance alternative. This argues for using a true RNG to demonstrate precognition or retroactive influence. But alas, the use of a true RNG opens the door to the psychokinesis interpretation: The participant might be influencing the placement of the upcoming target rather than perceiving it, a possibility supported by a body of empirical evidence testing psychokinesis with true RNGs (Radin, 2006, pp.154–160).”
Good! I was just about to make the very same objection! If someone can predict the whole sequence of [extremely long integer] values of a PRNG, it gets hardly any more irrational to imagine that he or she can mentally impact a quantum mechanics event. (And hopefully save Schröninger’s cat in the process.) Obviously, it begs the question as to how a subject could forecast a location of the picture that depends on the random generation but not forecast the result of the random generation.
“Like the clairvoyance interpretation, the psychokinesis interpretation also permits us to reject the claim that the direction of the causal arrow has been reversed. Ironically, the psychokinesis alternative can be ruled out by using a PRNG, which is immune to psychokinesis because the sequence of numbers is fully determined and can even be checked after the fact to confirm that its algorithm has not been perturbed. Over the course of our research program—and within the experiment just reported—we have obtained positive results using both PRNGs and a true RNG, arguably leaving precognition/reversed causality the only nonartifactual interpretation that can account for all the positive results.”
This is getting rather confusing. Avoid using a PRNG for fear the subject infers about the sequence and avoid using a RNG for fear of the subject tempering with the physical generator. An omniscient psychic would be able to hand both types of generators, wouldn’t he or she!?!
“This still leaves open the artifactual alternative that the output from the RNG is producing inadequately randomized sequences containing patterns that fortuitously match participants’ response biases.”
This objection shows how little confidence the author has in the randomness tests he previously mentioned: a proper random generator is not inadequately randomized. And if chance only rather than psychic powers is involved, there is no explanation for the match with the participants’ response. Unless those participants are so clever as to detect the flaws in the generator…
“In the present experiment, this possibility is ruled out by the twin findings that erotic targets were detected significantly more frequently than randomly interspersed nonerotic targets and that the nonerotic targets themselves were not detected significantly more frequently than chance. Nevertheless, for some of the other experiments reported in this article, it would be useful to have more general assurance that there are not patterns in the left/right placements of the targets that might correlate with response biases of participants. For this purpose, Lise Wallach, Professor of Psychology at Duke University, suggested that I run a virtual control experiment using random inputs in place of human participants.”
Absolutely brilliant! This test replacing the participants with random generators has shown that the subjects’ answers do not correspond to an iid sequence from a uniform distribution. It would indeed require great psychic powers to reproduce a perfectly iid U(0,1) sequence! And the participants were warned about the experiment so naturally expected to see patterns in the sequence of placements.