## normal variates in Metropolis step

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on November 14, 2017 by xi'an

A definitely puzzled participant on X validated, confusing the Normal variate or variable used in the random walk Metropolis-Hastings step with its Normal density… It took some cumulated efforts to point out the distinction. Especially as the originator of the question had a rather strong a priori about his or her background:

“I take issue with your assumption that advice on the Metropolis Algorithm is useless to me because of my ignorance of variates. I am currently taking an experimental course on Bayesian data inference and I’m enjoying it very much, i believe i have a relatively good understanding of the algorithm, but i was unclear about this specific.”

despite pondering the meaning of the call to rnorm(1)… I will keep this question in store to use in class when I teach Metropolis-Hastings in a couple of weeks.

## simulation by hand

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on November 28, 2016 by xi'an

A rather weird question on X validated this week was about devising a manual way to simulate (a few) normal variates. By manual I presume the author of the question means without resorting to a computer or any other business machine. Now, I do not know of any real phenomenon that is exactly and provably Normal. As analysed in a great philosophy of science paper by Aidan Lyon, the standard explanations for a real phenomenon to be Normal are almost invariably false, even those invoking the Central Limit Theorem. Hence I cannot think of a mechanical device that would directly return Normal generations from a Normal distribution with known parameters. However, since it is possible to simulate by hand Uniform U(0,1) variates [up to a given precision] using a chronometre or a wheel, calls to versions of the Box-Müller algorithm that do not rely on logarithmic or trigonometric functions are feasible, for instance by generating two Exponential variates, x and y, until 2y>(1-x)², x being the output. And generating Exponential variates is easy provided a radioactive material with known half-life is available, along with a Geiger counter. Or, if not, by calling von Neumann’s exponential generator. As detailed in Devroye’s simulation book.

After proposing this solution, I received a comment from the author of the question towards a simpler solution based, e.g., on the Central Limit Theorem. Presumably for simple iid random variables such as coin tosses or dice experiments. While I used the CLT for simulating Normal variables in my very early days [just after programming on punched cards!], I do not think this is a very good or efficient method, as the tails grow very slowly to normality. By comparison, using the same amount of coin tosses to create a sufficient number of binary digits of a Uniform variate produces a computer-precision exact Uniform variate, which can be exploited in Box-Müller-like algorithms to return exact Normal variates… Even by hand if necessary. [For some reason, this question attracted a lot of traffic and an encyclopaedic answer on X validated, despite being borderline to the point of being proposed for closure.]

## standard distributions

Posted in Books, Kids, Statistics with tags , , , on February 5, 2016 by xi'an

Joram Soch managed to get a short note arXived about the Normal cdf Φ by exhibiting an analytical version, nothing less!!! By which he means a power series representation of that cdf. This is an analytical [if known] function in the complex calculus sense but I wonder at the point of the (re)derivation. (I do realise that something’s wrong on the Internet is not breaking news!)

Somewhat tangentially, this reminds me of a paper I read recently where the Geometric Geo(p) distribution was represented as the sum of two independent variates, namely a Binomial B(p/(1+p)) variate and a Geometric 2G(p²) variate. A formula that can be iterated for arbitrarily long, meaning that a Geometric variate is an infinite sum of [powers of two] weighted Bernoulli variates. I like this representation very much (although it may well have been know for quite a while). However I fail to see how to take advantage of it for simulation purposes. Unless the number of terms in the sum can be determined first. And even then it would be less efficient than simulating a single Geometric…

## uniform correlation mixtures

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on December 4, 2015 by xi'an

Kai Zhang and my friends from Wharton, Larry Brown, Ed George and Linda Zhao arXived last week a neat mathematical foray into the properties of a marginal bivariate Gaussian density once the correlation ρ is integrated out. While the univariate marginals remain Gaussian (unsurprising, since these marginals do not depend on ρ in the first place), the joint density has the surprising property of being

[1-Φ(max{|x|,|y|})]/2

which turns an infinitely regular density into a density that is not even differentiable everywhere. And which is constant on squares rather than circles or ellipses. This is somewhat paradoxical in that the intuition (at least my intuition!) is that integration increases regularity… I also like the characterisation of the distributions factorising through the infinite norm as scale mixtures of the infinite norm equivalent of normal distributions. The paper proposes several threads for some extensions of this most surprising result. Other come to mind:

1. What happens when the Jeffreys prior is used in place of the uniform? Or Haldane‘s prior?
2. Given the mixture representation of t distributions, is there an equivalent for t distributions?
3. Is there any connection with the equally surprising resolution of the Drton conjecture by Natesh Pillai and Xiao-Li Meng?
4. In the Khintchine representation, correlated normal variates are created by multiplying a single χ²(3) variate by a vector of uniforms on (-1,1). What are the resulting variates for other degrees of freedomk in the χ²(k) variate?
5. I also wonder at a connection between this Khintchine representation and the Box-Müller algorithm, as in this earlier X validated question that I turned into an exam problem.

## a programming bug with weird consequences

Posted in Kids, pictures, R, Statistics, University life with tags , , , , , , on November 25, 2015 by xi'an

One student of mine coded by mistake an independent Metropolis-Hastings algorithm with too small a variance in the proposal when compared with the target variance. Here is the R code of this implementation:

#target is N(0,1)
#proposal is N(0,.01)
T=1e5
prop=x=rnorm(T,sd=.01)
ratop=dnorm(prop,log=TRUE)-dnorm(prop,sd=.01,log=TRUE)
ratav=ratop[1]
logu=ratop-log(runif(T))
for (t in 2:T){
if (logu[t]>ratav){
x[t]=prop[t];ratav=ratop[t]}else{x[t]=x[t-1]}
}


It produces outputs of the following shape
which is quite amazing because of the small variance. The reason for the lengthy freezes of the chain is the occurrence with positive probability of realisations from the proposal with very small proposal density values, as they induce very small Metropolis-Hastings acceptance probabilities and are almost “impossible” to leave. This is due to the lack of control of the target, which is flat over the domain of the proposal for all practical purposes. Obviously, in such a setting, the outcome is unrelated with the N(0,1) target!

It is also unrelated with the normal proposal in that switching to a t distribution with 3 degrees of freedom produces a similar outcome:

It is only when using a Cauchy proposal that the pattern vanishes:

## simulation by inverse cdf

Posted in Books, Kids, R, Statistics, University life with tags , , , , , on January 14, 2015 by xi'an

Another Cross Validated forum question that led me to an interesting (?) reconsideration of certitudes! When simulating from a normal distribution, is Box-Muller algorithm better or worse than using the inverse cdf transform? My first reaction was to state that Box-Muller was exact while the inverse cdf relied on the coding of the inverse cdf, like qnorm() in R. Upon reflection and commenting by other members of the forum, like William Huber, I came to moderate this perspective since Box-Muller also relies on transcendental functions like sin and log, hence writing

$X=\sqrt{-2\log(U_1)}\,\sin(2\pi U_2)$

also involves approximating in the coding of those functions. While it is feasible to avoid the call to trigonometric functions (see, e.g., Algorithm A.8 in our book), the call to the logarithm seems inescapable. So it ends up with the issue of which of the two functions is better coded, both in terms of speed and precision. Surprisingly, when coding in R, the inverse cdf may be the winner: here is the comparison I ran at the time I wrote my comments

> system.time(qnorm(runif(10^8)))
sutilisateur     système      écoulé
10.137           0.120      10.251
> system.time(rnorm(10^8))
utilisateur     système      écoulé
13.417           0.060      13.472


However re-rerunning it today, I get opposite results (pardon my French, I failed to turn the messages to English):

> system.time(qnorm(runif(10^8)))
utilisateur     système      écoulé
10.137       0.144      10.274
> system.time(rnorm(10^8))
utilisateur     système      écoulé
7.894       0.060       7.948


(There is coherence in the system time, which shows rnorm as twice as fast as the call to qnorm.) In terms, of precision, I could not spot a divergence from normality, either through a ks.test over 10⁸ simulations or in checking the tails:

“Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”. Luc Devroye, Non-uniform random variate generation, 1985

Update: As pointed out by Radford Neal in his comment, the above comparison is meaningless because the function rnorm() is by default based on the inversion of qnorm()! As indicated by Alexander Blocker in another comment, to use an other generator requires calling RNG as in

RNGkind(normal.kind = “Box-Muller”)
`

(And thanks to Jean-Louis Foulley for salvaging this quote from Luc Devroye, which does not appear to apply to the current coding of the Gaussian inverse cdf.)

## Forsythe’s algorithm

Posted in R, Statistics with tags , , , , , , on May 9, 2010 by xi'an

In connection with the Bernoulli factory post of last week, Richard Brent arXived a short historical note recalling George Forsythe’s algorithm for simulating variables with density $\exp\{-G(x)\}$ when $0\le G(x)\le 1$ (the extension to any upper bound is straightforward). The idea is to avoid computing the exponential function by simulating uniforms $u_i$ until

$G(x) \ge u_1 \ldots \ge u_{n-1} \le u_n$

since the probability of this event is

$\dfrac{G(x)^{n-1}}{(n-1)!} - \dfrac{G(x)^{n}}{n!}$

its expectation is $\exp\{G(x)\}$ and the probability that n is even is $\exp\{-G(x)\}$. This turns into a generation method if the support of G is bounded. In relation with the Bernoulli factory problem, I think this has potential applications in that, when the function G(x) is replaced with an unbiased estimator the subsequent steps remain valid. This approach would indeed involve computing one single value of G(x), but this is also the case with Latuszyński et al.’s and our solutions… So I am uncertain as to whether or not this has practical implications. (Brent mentions normal simulation but this is more history than methodology.)