## twenty-four to nil

Posted in Books, Kids, Statistics with tags , , , on September 16, 2022 by xi'an

Another puzzling question on X validated, where the expectation of a random sum of deterministic vectors is to be computed. (That is, the sum involves a random number of terms.) Without enough detail to understand why this proves a difficulty, given that each deterministic vector is to be invoked at most once. Nonetheless, my (straightforward) answer there

$Y_1\underbrace{\mathbb P(\tau\ge 1)}_{=1}+Y_2\mathbb P(\tau\ge 2)+\cdots+Y_N\underbrace{\mathbb P(\tau=N)}_{=0}$

proved much more popular (in terms of votes) that many of my much more involved answers there. Possibly because both question and answer are straightforward.

## why is this algorithm simulating a Normal variate?

Posted in Books, Kids, R, Statistics with tags , , , , , , , on September 15, 2022 by xi'an

A backward question from X validated as to why the above is a valid Normal generator based on exponential generations. Which can be found in most textbooks (if not ours). And in The Bible, albeit as an exercise. The validation proceeds from the (standard) Exponential density dominating the (standard) Normal density and, according to Devroye, may have originated from von Neumann himself. But with a brilliant reverse engineering resolution by W. Huber on X validated. While a neat exercise, it requires on average 2.64 Uniform generations per Normal generation, against a 1/1 ratio for Box-Muller (1958) polar approach, or 1/0.86 for the Marsaglia-Bray (1964) composition-rejection method. The apex of the simulation jungle is however Marsaglia and Tsang (2000) ziggurat algorithm. At least on CPUs since, Note however that “The ziggurat algorithm gives a more efficient method for scalar processors (e.g. old CPUs), while the Box–Muller transform is superior for processors with vector units (e.g. GPUs or modern CPUs)” according to Wikipedia.

To draw a comparison between this Normal generator (that I will consider as von Neumann’s) and the Box-Müller polar generator,

#Box-Müller
bm=function(N){
a=sqrt(-2*log(runif(N/2)))
b=2*pi*runif(N/2)
return(c(a*sin(b),a*cos(b)))
}

#vonNeumann
vn=function(N){
u=-log(runif(2.64*N))
v=-2*log(runif(2.64*N))>(u-1)^2
w=(runif(2.64*N)<.5)-2
return((w*u)[v])
}


here are the relative computing times

> system.time(bm(1e8))
utilisateur     système      écoulé
7.015       0.649       7.674
> system.time(vn(1e8))
utilisateur     système      écoulé
42.483       5.713      48.222


## stuck exchange

Posted in Books, Kids, Statistics with tags , , , , , , , on August 16, 2022 by xi'an

Made an attempt at explaining on X validated why simulating from the joint was equivalent to simulating from the marginal then from the conditional. Unfortunately failed as I could not fathom where the OP’s difficulty was. It seems it started at defining what drawing from a distribution meant… Then someone came by asking why I was writing the exponential in this unusual way (this was a barred E for expectation) and whether or not the “thin hollow rectangle” (a barred I for indicator) was standing for identity, that is

$\mathbb E\quad\text{and}\quad \mathbb I$

Reaching a point of incomprehension from which I could not recover…

## why do we need importance sampling?

Posted in Books, Kids, Statistics with tags , , , , on August 14, 2022 by xi'an

A rather common question about using importance sampling, posted on X validated: why is importance sampling helping in the event the function used in the expectation has restricted support, i.e., is equal to zero with positive probability? Which is a recommendation I make each time I teach about importance sampling, namely that estimating zero is rarely necessary! In my Saturday Night answer, I tried to give some intuition about the gain brought by a correct support for the importance function, carried in the ideal case when the truncated importance function remains available with its normalising constant. But it is unclear this set of explanations managed to reach the OP.

## an introduction to MCMC sampling

Posted in Books, Kids, Statistics with tags , , , , , , , , , on August 9, 2022 by xi'an

Following a rather clueless question on X validated, I had a quick read of A simple introduction to Markov Chain Monte–Carlo sampling, by Ravenzwaaij, Cassey, and Brown, published in 2018 in Psychonomic Bulletin & Review, which I had never opened to this day. The setting is very basic and the authors at pain to make their explanations as simple as possible, but I find the effort somehow backfires under the excess of details. And the characteristic avoidance of mathematical symbols and formulae. For instance, in the Normal mean example that is used as introductory illustration and that confused the question originator, there is no explanation for the posterior being a N(100,15) distribution, 100 being the sample average, the notation N(μ|x,σ) is used for the posterior density, and then the Metropolis comparison brings an added layer of confusion:

“Since the target distribution is normal with mean 100 (the value of the single observation) and standard deviation 15,  this means comparing N(100|108, 15) against N(100|110, 15).”

as it most unfortunately exchanges the positions of  μ and x (which is equal to 100). There is no fundamental error there, due to the symmetry of the Normal density, but this switch from posterior to likelihood certainly contributes to the confusion of the QO. Similarly for the Metropolis step description:

“If the new proposal has a lower posterior value than the most recent sample, then randomly choose to accept or
reject the new proposal, with a probability equal to the height of both posterior values. “

And the shortcomings of MCMC may prove equally difficult to ingest: like
“The method will “work” (i.e., the sampling distribution will truly be the target distribution) as long as certain conditions are met.
Firstly, the likelihood values calculated (…) to accept or reject the new proposal must accurately reflect the density of the proposal in the target distribution. When MCMC is applied to Bayesian inference, this means that the values calculated must be posterior likelihoods, or at least be proportional to the posterior likelihood (i.e., the ratio of the likelihoods calculated relative to one another must be correct).”

which leaves me uncertain as to what the authors do mean by the alternative situation, i.e., by the proposed value not reflecting the proposal density. Again, the reluctance in using (more) formulae hurts the intended pedagogical explanations.