## how to annoy a statistician [xkcd redux]

Posted in Books, pictures, Statistics, University life with tags , , , , on March 24, 2019 by xi'an

## normal variates in Metropolis step

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on November 14, 2017 by xi'an A definitely puzzled participant on X validated, confusing the Normal variate or variable used in the random walk Metropolis-Hastings step with its Normal density… It took some cumulated efforts to point out the distinction. Especially as the originator of the question had a rather strong a priori about his or her background:

“I take issue with your assumption that advice on the Metropolis Algorithm is useless to me because of my ignorance of variates. I am currently taking an experimental course on Bayesian data inference and I’m enjoying it very much, i believe i have a relatively good understanding of the algorithm, but i was unclear about this specific.”

despite pondering the meaning of the call to rnorm(1)… I will keep this question in store to use in class when I teach Metropolis-Hastings in a couple of weeks.

## simulation by hand

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on November 28, 2016 by xi'an A rather weird question on X validated this week was about devising a manual way to simulate (a few) normal variates. By manual I presume the author of the question means without resorting to a computer or any other business machine. Now, I do not know of any real phenomenon that is exactly and provably Normal. As analysed in a great philosophy of science paper by Aidan Lyon, the standard explanations for a real phenomenon to be Normal are almost invariably false, even those invoking the Central Limit Theorem. Hence I cannot think of a mechanical device that would directly return Normal generations from a Normal distribution with known parameters. However, since it is possible to simulate by hand Uniform U(0,1) variates [up to a given precision] using a chronometre or a wheel, calls to versions of the Box-Müller algorithm that do not rely on logarithmic or trigonometric functions are feasible, for instance by generating two Exponential variates, x and y, until 2y>(1-x)², x being the output. And generating Exponential variates is easy provided a radioactive material with known half-life is available, along with a Geiger counter. Or, if not, by calling von Neumann’s exponential generator. As detailed in Devroye’s simulation book.

After proposing this solution, I received a comment from the author of the question towards a simpler solution based, e.g., on the Central Limit Theorem. Presumably for simple iid random variables such as coin tosses or dice experiments. While I used the CLT for simulating Normal variables in my very early days [just after programming on punched cards!], I do not think this is a very good or efficient method, as the tails grow very slowly to normality. By comparison, using the same amount of coin tosses to create a sufficient number of binary digits of a Uniform variate produces a computer-precision exact Uniform variate, which can be exploited in Box-Müller-like algorithms to return exact Normal variates… Even by hand if necessary. [For some reason, this question attracted a lot of traffic and an encyclopaedic answer on X validated, despite being borderline to the point of being proposed for closure.]

## standard distributions

Posted in Books, Kids, Statistics with tags , , , on February 5, 2016 by xi'an

Joram Soch managed to get a short note arXived about the Normal cdf Φ by exhibiting an analytical version, nothing less!!! By which he means a power series representation of that cdf. This is an analytical [if known] function in the complex calculus sense but I wonder at the point of the (re)derivation. (I do realise that something’s wrong on the Internet is not breaking news!)

Somewhat tangentially, this reminds me of a paper I read recently where the Geometric Geo(p) distribution was represented as the sum of two independent variates, namely a Binomial B(p/(1+p)) variate and a Geometric 2G(p²) variate. A formula that can be iterated for arbitrarily long, meaning that a Geometric variate is an infinite sum of [powers of two] weighted Bernoulli variates. I like this representation very much (although it may well have been know for quite a while). However I fail to see how to take advantage of it for simulation purposes. Unless the number of terms in the sum can be determined first. And even then it would be less efficient than simulating a single Geometric…

## uniform correlation mixtures

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on December 4, 2015 by xi'an Kai Zhang and my friends from Wharton, Larry Brown, Ed George and Linda Zhao arXived last week a neat mathematical foray into the properties of a marginal bivariate Gaussian density once the correlation ρ is integrated out. While the univariate marginals remain Gaussian (unsurprising, since these marginals do not depend on ρ in the first place), the joint density has the surprising property of being

[1-Φ(max{|x|,|y|})]/2

which turns an infinitely regular density into a density that is not even differentiable everywhere. And which is constant on squares rather than circles or ellipses. This is somewhat paradoxical in that the intuition (at least my intuition!) is that integration increases regularity… I also like the characterisation of the distributions factorising through the infinite norm as scale mixtures of the infinite norm equivalent of normal distributions. The paper proposes several threads for some extensions of this most surprising result. Other come to mind:

1. What happens when the Jeffreys prior is used in place of the uniform? Or Haldane‘s prior?
2. Given the mixture representation of t distributions, is there an equivalent for t distributions?
3. Is there any connection with the equally surprising resolution of the Drton conjecture by Natesh Pillai and Xiao-Li Meng?
4. In the Khintchine representation, correlated normal variates are created by multiplying a single χ²(3) variate by a vector of uniforms on (-1,1). What are the resulting variates for other degrees of freedomk in the χ²(k) variate?
5. I also wonder at a connection between this Khintchine representation and the Box-Müller algorithm, as in this earlier X validated question that I turned into an exam problem.