## an elegant result on exponential spacings

Posted in Statistics with tags , , , , , , , , , , , , , on April 19, 2017 by xi'an

A question on X validated I spotted in the train back from Lyon got me desperately seeking a reference in Devroye’s Generation Bible despite the abyssal wireless and a group of screeching urchins a few seats away from me… The question is about why

$\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)$

when the Y’s are standard exponentials. Since this reminded me immediately of exponential spacings, thanks to our Devroye fan-club reading group in Warwick,  I tried to download Devroye’s Chapter V and managed after a few aborts (and a significant increase in decibels from the family corner). The result by Sukhatme (1937) is in plain sight as Theorem 2.3 and is quite elegant as it relies on the fact that

$\sum_{i=1}^n y_i=\sum_{j=1}^n (n-j+1)(y_{(j)}-y_{(j-1)})=\sum_{j=2}^n (y_{(j)}-y_{(1)})$

hence sums up as a mere linear change of variables! (Pandurang Vasudeo Sukhatme (1911–1997) was an Indian statistician who worked on human nutrition and got the Guy Medal of the RSS in 1963.)

## Гнеде́нко and Forsythe [and e]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on February 16, 2016 by xi'an

In the wake of my earlier post on the Monte Carlo estimation of e and e⁻¹, after a discussion with my colleague Murray Pollock (Warwick) Gnedenko’s solution, he pointed out another (early) Monte Carlo approximation called Forsythe’s method. That is detailed quite neatly in Luc Devroye’s bible, Non-uniform random variate generation (a free bible!). The idea is to run a sequence of uniform generations until the sequence goes up, i.e., until the latest uniform is larger than the previous one. The expectation of the corresponding stopping rule, N, which is the number of generations the uniform sequence went down continuously is then e, while the probability that N is odd is e⁻¹, most unexpectedly! Forsythe’s method actually aims at a much broader goal, namely simulating from any density of the form g(x) exp{-F(x)}, F taking values in (0,1). This method generalises von Neuman’s exponential generator (see Devroye, p.126) which only requires uniform generations.

This is absolutely identical to Gnedenko’s approach in that both events have the same 1/n! probability to occur [as pointed out by Gérard Letac in a comment on the previous entry]. (I certainly cannot say whether or not one of the authors was aware of the other’s result: Forsythe generalised von Neumann‘s method around 1972, while Gnedenko published Theory of Probability at least in 1969, but this may be the date of the English translation, I have not been able to find the reference on the Russian wikipedia page…) Running a small R experiment to compare both distributions of N, the above barplot shows that they look quite similar:

n=1e6
use=runif(n)
# Gnedenko's in action:
gest=NULL
i=1
while (i<(n-100)){
sumuse=cumsum(use[i:(i+10)])
if (sumuse[11]<1])
sumuse=cumsum(use[i:(i+100)])
j=min((1:length(sumuse))[sumuse>1])
gest=c(gest,j)
i=i+j}
#Forsythe's method:
fest=NULL
i=1
while (i<(n-100)){
sumuse=c(-1,diff(use[i:(i+10)]))
if (max(sumuse)<0])
sumuse=c(-1,diff(use[i:(i+100)]))
j=min((1:length(sumuse))[sumuse>0])
fest=c(fest,j)
i=i+j}

And the execution times of both approaches [with this rudimentary R code!] are quite close.

## an a-statistical proof of a binomial identity

Posted in Books, Statistics, University life with tags , , , , on May 15, 2014 by xi'an

When waiting for Andrew this morning, I was browsing through the arXived papers of the day and came across this “Simple Statistical Proof of a Binomial Identity” by P. Vellaisamy. The said identity is that, for all s>0,

$\sum_{k=0}^n (-1)^k {n \choose k}\dfrac{s}{s+k} = \prod_{k=1}^n \dfrac{k}{k+s}$

Nothing wrong with the maths in this paper (except for a minor typo using Exp(1) instead of Exp(s), p.2).  But I am perplexed by the label “statistical” used by the author, as this proof is an entirely analytic argument, based on two different integrations of the same integral. Nothing connected with data or any statistical  technique: this is sheer combinatorics, of the kind one could find in William Feller‘s volume I.

## new typos in Monte Carlo Statistical Methods

Posted in Books, Statistics, University life with tags , , , , , , , , on December 7, 2011 by xi'an

Thanks to Jay Bartroff for pointing out those typos after teaching from Monte Carlo Statistical Methods:

• On page 52, the gamma Ga(α, β) distribution uses β as a rate parameter while in other places it is a scale parameter, see, e.g. eqn (2.2) [correct, I must say the parameterisation of the gamma distribution is a pain and, while we tried to homogenise the notation with the second parameter being the rate, there are places like this where either the rate convention (as in the exponential distribution) or the scale convention (as in the generation) is the natural one…]
• Still on page 52, in Example 2.20, truncated normals are said to be discussed after Example 1.5, but they’re not. [There is a mention made of constrained parameters right after but this is rather cryptic!]
• On page 53, the ratio f/gα following the second displayed eqn is missing some terms [or, rather, the equality sign should be a proportional sign]
• Still on page 53, in eqn (2.11), the whole expression, rather than the square root, should be divided by 2 [yes, surprising typo given that it was derived correctly in the original paper!]
• On page 92, the exact constraint is that supp(g) actually needs only contain the intersection of supp(f) and supp(h), such as when approximating tail probabilities [correct if the importance sampling method is only used for a single function h, else the condition stands as is]
• On page 94, fY does not need that integral in the denominator [correct, we already corrected for the truncation by subtracting 4.5 in the exponential]
• On page 114, Problem 3.22, ωi is missing a factor of 1/n [correct]
• On page 218, in Example 6.24, P00=0 [indeed, our remark that Pxx>0 should start with x=1. Note that this does not change the aperiodicity, though]
• On page 282, the log α after the 2nd displayed equation should be eα [correct, this was pointed out in a previous list of typos, but clearly not corrected in the latest printing!]
• On page 282, in the 5th displayed equation there are missing factors π(α’|b)/π(α0|b) in rejection probability [actually, no, because, those terms being both proposals and priors, they cancel in the ratio. We could add a sentence to this effect to explain why, though.]
• On page 634, the reference page for exponential distribution is mistakenly given as 99 [wow, very thorough reading! There is an exponential distribution involved on page 99 but I agree this is not the relevant page…]