## amazing appendix

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , on February 13, 2018 by xi'an

In the first appendix of the 1995 Statistical Science paper of Besag, Green, Higdon and Mengersen, on MCMC, “Bayesian Computation and Stochastic Systems”, stands a fairly neat result I was not aware of (and which Arnaud Doucet, with his unrivalled knowledge of the literature!, pointed out to me in Oxford, avoiding me the tedium to try to prove it afresco!). I remember well reading a version of the paper in Fort Collins, Colorado, in 1993 (I think!) but nothing about this result.

It goes as follows: when running a Metropolis-within-Gibbs sampler for component x¹ of a collection of variates x¹,x²,…, thus aiming at simulating from the full conditional of x¹ given x⁻¹ by making a proposal q(x|x¹,x⁻¹), it is perfectly acceptable to use a proposal that depends on a parameter α (no surprise so far!) and to generate this parameter α anew at each iteration (still unsurprising as α can be taken as an auxiliary variable) and to have the distribution of this parameter α depending on the other variates x²,…, i.e., x⁻¹. This is the surprising part, as adding α as an auxiliary variable was messing up the update of x⁻¹. But the proof as found in the 1995 paper [page 35] does not require to consider α as such as it establishes global balance directly. (Or maybe still detailed balance when writing the whole Gibbs sampler as a cycle of Metropolis steps.) Terrific! And a whiff mysterious..!

## exam question

Posted in Kids, Statistics, University life with tags , , , , , , , , , on June 24, 2016 by xi'an

A question for my third year statistics exam that I borrowed from Cross Validated: no student even attempted to solve this question…!

And another one borrowed from the highly popular post on the random variable [almost] always smaller than its mean!

## point process-based Monte Carlo

Posted in Books, Kids, Statistics, University life with tags , , , , , on December 3, 2015 by xi'an

Clément Walter from Paris just pointed me to an arXived paper he had very recently gotten accepted for publication in Statistics and Computing. (Congrats!) Because his paper relates to nested sampling. And connects it with rare event simulation via interacting particle systems. And multilevel Monte Carlo. I had missed it when it came out on arXiv last December [as the title was unrelated with nested sampling if not Monte Carlo], but the paper brings fairly interesting new results about an ideal version of nested sampling that is

1. unbiased when using an infinite number of terms;
2. always better than the standard Monte Carlo estimator, variance-wise;
3. connected with an implicit marked Poisson process; and
4. enjoying a finite variance provided the quantity of interest has an 1+ε moment.

Of course, such results only hold for an ideal version and do not address the issue of the conditional simulations required by nested sampling. (Which has an impact on the computing time as the conditional simulation becomes more and more expensive as the likelihood value increases.) The explanation therein of the approximation of tail probabilities by a Poisson estimate makes the link with deterministic nested sampling much clearer to me. Point 2 above means that the nested sampling estimate always does better than the average of the likelihood values produced by an iid or MCMC simulation from the prior distribution. The paper also borrows from the debiasing approach of Rhee and Glynn (already used by the Russian roulette) to turn truncated versions of the nested sampling estimator into an unbiased estimator, with a limited impact on the variance of the estimator. Truncation is associated with the generation of a geometric stopping time which parameter needs to be optimised. Without a more detailed reading, I am somewhat lost as to this optimisation remains feasible in complex settings… The paper contains an illustration for a Pareto distribution where optimisation and calibration can be conducted quite far. It also re-analyses the Mexican hat example of Skilling (2006), showing that our stopping rule may induce bias.

## quantile functions: mileage may vary

Posted in Books, R, Statistics with tags , , , , , , on May 12, 2015 by xi'an

When experimenting with various quantiles functions in R, I was shocked [ok this is a bit excessive, let us say surprised] by how widely the execution times would vary. To the point of blaming a completely different feature of R. Borrowing from Charlie Geyer’s webpage on the topic of probability distributions in R, here is a table for some standard distributions: I ran

u=runif(1e7)
system.time(x<-qcauchy(u))


choosing an arbitrary parameter whenever needed.

Distribution Function Time
Cauchy qcauchy 2.2
Chi-Square qchisq 43.8
Exponential qexp 0.95
F qf 34.2
Gamma qgamma 37.2
Logistic qlogis 1.7
Log Normal qlnorm 2.2
Normal qnorm 1.4
Student t qt 31.7
Uniform qunif 0.86
Weibull qweibull 2.9

Of course, it does not mean much in that all the slow distributions (except for Weibull) are parameterised. Nonetheless, that a chi-square inversion take 50 times longer than a uniform inversion remains puzzling as to why it is not coded more efficiently. In particular, I was wondering why the chi-square inversion was slower than the Gamma inversion. Rerunning both inversions showed that they are equivalent:

> u=runif(1e7)
> system.time(x<-qgamma(u,sha=1.5))
utilisateur système écoulé
21.534 0.016 21.532
> system.time(x<-qchisq(u,df=3))
utilisateur système écoulé
21.372 0.008 21.361


Which also shows how variable system.time can be.

## simulating correlated Binomials [another Bernoulli factory]

Posted in Books, Kids, pictures, R, Running, Statistics, University life with tags , , , , , , , on April 21, 2015 by xi'an

This early morning, just before going out for my daily run around The Parc, I checked X validated for new questions and came upon that one. Namely, how to simulate X a Bin(8,2/3) variate and Y a Bin(18,2/3) such that corr(X,Y)=0.5. (No reason or motivation provided for this constraint.) And I thought the following (presumably well-known) resolution, namely to break the two binomials as sums of 8 and 18 Bernoulli variates, respectively, and to use some of those Bernoulli variates as being common to both sums. For this specific set of values (8,18,0.5), since 8×18=12², the solution is 0.5×12=6 common variates. (The probability of success does not matter.) While running, I first thought this was a very artificial problem because of this occurrence of 8×18 being a perfect square, 12², and cor(X,Y)x12 an integer. A wee bit later I realised that all positive values of cor(X,Y) could be achieved by randomisation, i.e., by deciding the identity of a Bernoulli variate in X with a Bernoulli variate in Y with a certain probability ϖ. For negative correlations, one can use the (U,1-U) trick, namely to write both Bernoulli variates as

$X_1=\mathbb{I}(U\le p)\quad Y_1=\mathbb{I}(U\ge 1-p)$

in order to minimise the probability they coincide.

I also checked this result with an R simulation

> z=rbinom(10^8,6,.66)
> y=z+rbinom(10^8,12,.66)
> x=z+rbinom(10^8,2,.66)
cor(x,y)
> cor(x,y)
[1] 0.5000539


Searching on Google gave me immediately a link to Stack Overflow with an earlier solution with the same idea. And a smarter R code.

## random generators… unfit for ESP testing?!

Posted in Books, Statistics with tags , , , , , , , on September 10, 2014 by xi'an

“The term psi denotes anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms.”

When re-reading [in the taxi to Birmingham airport] Bem’s piece on “significant” ESP tests, I came upon the following hilarious part that I could not let pass:

“For most psychological experiments, a random number table or the random function built into most programming languages provides an adequate tool for randomly assigning participants to conditions or sequencing stimulus presentations. For both methodological and conceptual reasons, however, psi researchers have paid much closer attention to issues of randomization.

At the methodological level, the problem is that the random functions included in most computer languages are not very good in that they fail one or more of the mathematical tests used to assess the randomness of a sequence of numbers (L’Ecuyer, 2001), such as Marsaglia’s rigorous Diehard Battery of Tests of Randomness (1995). Such random functions are sometimes called pseudo random number generators (PRNGs) because they [are] not random in the sense of being indeterminate because once the initial starting number (the seed) is set, all future numbers in the sequence are fully determined.”

Well, pseudo-random generators included in all modern computer languages that I know have passed tests like diehard. It would be immensely useful to learn of counterexamples as those using the corresponding language should be warned!!!

“In contrast, a hardware-based or “true” RNG is based on a physical process, such as radioactive decay or diode noise, and the sequence of numbers is indeterminate in the quantum mechanical sense. This does not in itself guarantee that the resulting sequence of numbers can pass all the mathematical tests of randomness (…) Both Marsaglia’s own PRNG algorithm and the “true” hardware-based Araneus Alea I RNG used in our experiments pass all his diehard tests (…) At the conceptual level, the choice of a PRNG or a hardware-based RNG bears on the interpretation of positive findings. In the present context, it bears on my claim that the experiments reported in this article provide evidence for precognition or retroactive influence.”

There is no [probabilistic] validity in the claim that hardware random generators are more random than pseudo-random ones. Hardware generators may be unpredictable even by the hardware conceptor, but the only way to check they produce generations from a uniform distribution follows exactly the same pattern as for PRNG. And the lack of reproducibility of the outcome makes it impossible to check the reproducibility of the study. But here comes the best part of the story!

“If an algorithm-based PRNG is used to determine the successive left-right positions of the target pictures, then the computer already “knows” the upcoming random number before the participant makes his or her response; in fact, once the initial seed number is generated, the computer implicitly knows the entire sequence of left/right positions. As a result, this information is potentially available to the participant through real-time clairvoyance, permitting us to reject the more extraordinary claim that the direction of the causal arrow has actually been reversed.”

Extraordinary indeed… But not more extraordinary than conceiving that a [psychic] participant in the experiment may “see” the whole sequence of random numbers!

“In contrast, if a true hardware-based RNG is used to determine the left/right positions, the next number in the sequence is indeterminate until it is actually generated by the quantum physical process embedded in the RNG, thereby ruling out the clairvoyance alternative. This argues for using a true RNG to demonstrate precognition or retroactive influence. But alas, the use of a true RNG opens the door to the psychokinesis interpretation: The participant might be influencing the placement of the upcoming target rather than perceiving it, a possibility supported by a body of empirical evidence testing psychokinesis with true RNGs (Radin, 2006, pp.154–160).”

Good! I was just about to make the very same objection! If someone can predict the whole sequence of [extremely long integer] values of a PRNG, it gets hardly more irrational to imagine that he or she can mentally impact a quantum mechanics event. (And hopefully save Schröninger’s cat in the process.) Obviously, it begets the question as to how a subject could forecast a location of the picture that depends on the random generation but not forecast the result of the random generation.

“Like the clairvoyance interpretation, the psychokinesis interpretation also permits us to reject the claim that the direction of the causal arrow has been reversed. Ironically, the psychokinesis alternative can be ruled out by using a PRNG, which is immune to psychokinesis because the sequence of numbers is fully determined and can even be checked after the fact to confirm that its algorithm has not been perturbed. Over the course of our research program—and within the experiment just reported—we have obtained positive results using both PRNGs and a true RNG, arguably leaving precognition/reversed causality the only nonartifactual interpretation that can account for all the positive results.”

This is getting rather confusing. Avoid using a PRNG for fear the subject infers about the sequence and avoid using a RNG for fear of the subject tempering with the physical generator. An omniscient psychic would be able to hand both types of generators, wouldn’t he or she!?!

“This still leaves open the artifactual alternative that the output from the RNG is producing inadequately randomized sequences containing patterns that fortuitously match participants’ response biases.”

This objection shows how little confidence the author has in the randomness tests he previously mentioned: a proper random generator is not inadequately randomized. And if chance only rather than psychic powers is involved, there is no explanation for the match with the participants’ response. Unless those participants are so clever as to detect the flaws in the generator…

“In the present experiment, this possibility is ruled out by the twin findings that erotic targets were detected significantly more frequently than randomly interspersed nonerotic targets and that the nonerotic targets themselves were not detected significantly more frequently than chance. Nevertheless, for some of the other experiments reported in this article, it would be useful to have more general assurance that there are not patterns in the left/right placements of the targets that might correlate with response biases of participants. For this purpose, Lise Wallach, Professor of Psychology at Duke University, suggested that I run a virtual control experiment using random inputs in place of human participants.”

Absolutely brilliant! This test replacing the participants with random generators has shown that the subjects’ answers do not correspond to an iid sequence from a uniform distribution. It would indeed require great psychic powers to reproduce a perfectly iid U(0,1) sequence! And the participants were warned about the experiment so naturally expected to see patterns in the sequence of placements.

## Congruential generators all are RANDUs!

Posted in R, Statistics, University life with tags , , , , , on January 31, 2010 by xi'an

In case you did not read all the slides of Regis Lebrun’s talk on pseudo-random generators I posted yesterday, one result from Marsaglia’s (in a 1968 PNAS paper) exhibited my ignorance during Regis’ Big’ MC seminar on Thursday. Marsaglia indeed showed that all multiplicative congruential generators

$r_{i+1}= kr_i \text{modulo }m$

lie on a series of hyperplanes whose number gets ridiculously small as the dimension d increases! If you turn the $r_i$‘s into uniforms $u_i$ and look at the d dimensional vectors

$\pi_1=(u_1,\ldots,u_d),\,\pi_2=(u_2,\ldots,u_{n+1}),\,\ldots$

they are on a small number of hyperplanes, at most $(d!m)^{1/m}$, which gives 41 hyperplanes when $m=2^{32}$… So in this sense all generators share the same poor property as the infamous RANDU which is such that that $(u_{i},u_{i+1},u_{i+2})$ is always over one of 16 hyperplanes, an exercise we use in both Introducing Monte Carlo Methods with R and Monte Carlo Statistical Methods (but not in our general audience out solution manual). I almost objected to the general result being irrelevant as the $\pi_i$‘s share $u_j$‘s, but of course the subsequence $\pi_1,\pi_d,\pi_{2d},...$ also share enjoys this property!