order, order!

Posted in Books, pictures, Statistics, University life with tags , , , , , , on June 9, 2020 by xi'an

A very standard (one-line) question on X validated, namely whether min(X,Y) could enjoy a finite mean when both X and Y had infinite means [the answer is yes, possibly!] brought a lot of traffic, including an incorrect answer and bringing it to be one of the “Hot Network Questions“, for no clear reason. Beside my half-Cauchy example, some answers pointed out the connection between mean and cdf, as integrated cdf on the negative half-line and integrated complement cdf on the positive half-line, and between mean and quantile function, as

$\mathbb E[T(X)]=\int_0^1 T(Q_X(u))\text{d}u$

since it nicely expands to

$\mathbb E[T(X_{(k)})]=\int_0^1 \frac{u^{k-1}(1-u)^{n-k-1}}{B(k,n-k)}T(Q_X(u))\text{d}u$

but I remain bemused by the excitement..! (Including the many answers and the lack of involvement of the OP.)

sampling the mean

Posted in Kids, R, Statistics with tags , , , , , on December 12, 2019 by xi'an

A challenge found on the board of the coffee room at CEREMADE, Université Paris Dauphine:

When sampling with replacement three numbers in {0,1,…,N}, what is the probability that their average is (at least) one of the three?

With a (code-golfed!) brute force solution of

mean(!apply((a<-matrix(sample(0:n,3e6,rep=T),3)),2,mean)-apply(a,2,median))


producing a graph pretty close to 3N/2(N+1)² (which coincides with a back-of-the-envelope computation):

from tramway to Panzer (or back!)…

Posted in Books, pictures, Statistics with tags , , , , , , on June 14, 2019 by xi'an

Although it is usually presented as the tramway problem, namely estimating the number of tram or bus lines in a city given observing one line number, including The Bayesian Choice by yours truly, the original version of the problem is about German tanks, Panzer V tanks to be precise, which total number M was to be estimated by the Allies from their observation of serial numbers of a number k of tanks. The Riddler is restating the problem when the only available information is made of the smallest, 22, and largest, 144, numbers, with no information about the number k itself. I am unsure what the Riddler means by “best” estimate, but a posterior distribution on M (and k) can be certainly be constructed for a prior like 1/k x 1/M² on (k,M). (Using M² to make sure the posterior mean does exist.) The joint distribution of the order statistics is

$\frac{k!}{(k-2)!} M^{-k} (144-22)^{k-2}\, \Bbb I_{2\le k\le M\ge 144}$

which makes the computation of the posterior distribution rather straightforward. Here is the posterior surface (with an unfortunate rendering of an artefactual horizontal line at 237!), showing a concentration near the lower bound M=144. The posterior mode is actually achieved for M=144 and k=7, while the posterior means are (rounded as) M=169 and k=9.

easy Riddler

Posted in Kids, R with tags , , , on May 10, 2019 by xi'an

The riddle of the week is rather standard probability calculus

If N points are generated at random places on the perimeter of a circle, what is the probability that you can pick a diameter such that all of those points are on only one side of the newly halved circle?

Since it is equivalent to finding the range of N Uniform variates less than ½. And since the range of N Uniform variates is distributed as a Be(N-1,2) random variate. The resulting probability, which happens to be exactly $N/2^{N-1}$, is decreasing exponentially, as shown below…

survivalists [a Riddler’s riddle]

Posted in Books, Kids, R, Statistics with tags , , , , , , on April 22, 2019 by xi'an

A neat question from The Riddler on a multi-probability survival rate:

Nine processes are running in a loop with fixed survivals rates .99,….,.91. What is the probability that the first process is the last one to die? Same question with probabilities .91,…,.99 and the probability that the last process is the last one to die.

The first question means that the realisation of a Geometric G(.99) has to be strictly larger than the largest of eight Geometric G(.98),…,G(.91). Given that the cdf of a Geometric G(a) is [when counting the number of attempts till failure, included, i.e. the Geometric with support the positive integers]

$F(x)=\Bbb P(X\le x)=1-a^{x}$

the probability that this happens has the nice (?!) representation

$\sum_{x=2}^\infty a_1^{x-1}(1-a_1)\prod_{j\ge 2}(1-a_j^{x-1})=(1-a_1)G(a_1,\ldots,a_9)$

which leads to an easy resolution by recursion since

$G(a_1,\ldots,a_9)=G(a_1,\ldots,a_8)-G(a_1a_9,\ldots,a_8)$

and $G(a)=a/(1-a)$

and a value of 0.5207 returned by R (Monte Carlo evaluation of 0.5207 based on 10⁷ replications). The second question is quite similar, with solution

$\sum_{x=2}^\infty a_1^{x-1}(1-a_1)\prod_{j\ge 1}(1-a_j^{x})=a^{-1}(1-a_1)G(a_1,\ldots,a_9)$

and value 0.52596 (Monte Carlo evaluation of 0.52581 based on 10⁷ replications).