## more breaking sticks

Posted in Statistics with tags , , on July 19, 2021 by xi'an

A quick riddle from The Riddler that, when thinned to the actual maths problem, ends up asking for

$\mathbb P(\max(U_1,U_2,U_3)=U_3)+\mathbb P(\min(U_1,U_2,U_3)=U_3)$

which is equal to 2/3…. Anticlimactic.

## new order

Posted in Books, Kids, R, Statistics with tags , , , , on February 5, 2021 by xi'an

The latest riddle from The Riddler was both straightforward: given four iid Normal variates, X¹,X²,X³,X⁴, what is the probability that X¹+X²<X³+X⁴ given that X¹<X³ ? The answer is ¾ and it actually does not depend on the distribution of the variates. The bonus question is however much harder: what is this probability when there are 2N iid Normal variates?

I posted the question on math.stackexchange, then on X validated, but received no hint at a possible simplification of the probability. And then erased the questions. Given the shape of the domain where the bivariate Normal density is integrated, it sounds most likely there is no closed-form expression. (None was proposed by the Riddler.) The probability decreases roughly in N³ when computing this probability by simulation and fitting a regression.

> summary(lm(log(p)~log(r)))

Residuals:
Min        1Q    Median        3Q       Max
-0.013283 -0.010362 -0.000606  0.007835  0.039915

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.111235   0.008577  -12.97 4.11e-13 ***
log(r)      -0.311361   0.003212  -96.94  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01226 on 27 degrees of freedom
Multiple R-squared:  0.9971,	Adjusted R-squared:  0.997
F-statistic:  9397 on 1 and 27 DF,  p-value: < 2.2e-16


## where is .5?

Posted in Statistics with tags , , , , on September 10, 2020 by xi'an

A Riddler’s riddle on breaking the unit interval into 4 random bits (by which I understand picking 3 Uniform realisations and ordering them) and finding the length of the bit containing ½ (sparing you the chore of converting inches and feet into decimals). The result can be found by direct integration since the ordered Uniform variates are Beta’s, and so are their consecutive differences, leading to an average length of 15/32. Or by raw R simulation:

simz=t(apply(matrix(runif(3*1e5),ncol=3),1,sort))
mean((simz[,1]>.5)*simz[,1]+
(simz[,1]<.5)*(simz[,2]>.5)*(simz[,2]-simz[,1])+
(simz[,2]<.5)*(simz[,3]>.5)*(simz[,3]-simz[,2])+
(simz[,3]<.5)*(1-simz[,3]))


Which can be reproduced for other values than ½, showing that ½ is the value leading to the largest expected length. I wonder if there is a faster way to reach this nice 15/32.

## order, order!

Posted in Books, pictures, Statistics, University life with tags , , , , , , on June 9, 2020 by xi'an

A very standard (one-line) question on X validated, namely whether min(X,Y) could enjoy a finite mean when both X and Y had infinite means [the answer is yes, possibly!] brought a lot of traffic, including an incorrect answer and bringing it to be one of the “Hot Network Questions“, for no clear reason. Beside my half-Cauchy example, some answers pointed out the connection between mean and cdf, as integrated cdf on the negative half-line and integrated complement cdf on the positive half-line, and between mean and quantile function, as

$\mathbb E[T(X)]=\int_0^1 T(Q_X(u))\text{d}u$

since it nicely expands to

$\mathbb E[T(X_{(k)})]=\int_0^1 \frac{u^{k-1}(1-u)^{n-k-1}}{B(k,n-k)}T(Q_X(u))\text{d}u$

but I remain bemused by the excitement..! (Including the many answers and the lack of involvement of the OP.)

## sampling the mean

Posted in Kids, R, Statistics with tags , , , , , on December 12, 2019 by xi'an

A challenge found on the board of the coffee room at CEREMADE, Université Paris Dauphine:

When sampling with replacement three numbers in {0,1,…,N}, what is the probability that their average is (at least) one of the three?

With a (code-golfed!) brute force solution of

mean(!apply((a<-matrix(sample(0:n,3e6,rep=T),3)),2,mean)-apply(a,2,median))


producing a graph pretty close to 3N/2(N+1)² (which coincides with a back-of-the-envelope computation):