## a simpler (?) birthday problem

Posted in Books, Kids, Statistics with tags , , , , , , , on April 9, 2022 by xi'an

A monthly birthday problem from the Riddler:

What was the probability that none of the 40 people had birthdays this month? What is the probability that there is at least one month in the year during which none of the 40 people had birthdays (not necessarily this month)?

Assuming the same number of days in all months, the probability that one individual is not born in March is 1/12 and hence the probability that none of 40 (independent!) persons are not born in March is (11/12)⁴⁰, about 3%. The second question can be solved by reading Feller’s chapter on the combination of events (1970, Chapter IV, p.102). The probability that all months are seeing at least one birthday is

$\sum_{i=0}^{12} (-1)^i {12\choose i}(1-i/12)^{40}=0.6732162$

which can be checked by a quick R simulation. The complement 0.326 is thus close to 11 x 0.03!

## triple ruin

Posted in Books, Kids, pictures, R, Statistics, Wines with tags , , , , , , , , , , on December 28, 2021 by xi'an

An almost straightforward riddle from The Riddler involving a triple gambler’s ruin: Dawn competes against three players Alessandra, Berenike, and Chinue, with probabilities of winning one round ¾, ½, and ¼, respectively, until the cumulated score reaches ±15, ±30, and ±45, for the first, second, and third games. What is Dawn’s optimal sequence of adversaries?

First, a brute force R simulation shows that the optimal ordering is to play the three adversaries first weakest, third strongest and middle fair:

ord=function(p){
z=2*(runif(1)<p[1])-1
while(abs(z)<15)z=z+2*(runif(1)<p[1])-1
y=2*(runif(1)<p[2])-1
while(abs(z+y)<30)y=y+2*(runif(1)<p[2])-1
x=2*(runif(1)<p[3])-1
while(abs(z+y+x)<45)x=x+2*(runif(1)<p[3])-1
return(x+y+z>0)}
mcord=function(p,T=1e2){
for(t in 1:T)F=F+ord(p)
return(F/T)}
comp=function(T=1e2){
return(c(mcord(c(.5,.55,.45),t),
#mcord(c(.5,.45,.55),t),#1-above
mcord(c(.55,.5,.45),t),
#mcord(c(.45,.5,.55),t),#1-above
mcord(c(.55,.45,.5),t)
#mcord(c(.45,.55,.5),t)))#1-above
))}


where I used probabilities closer to ½ to avoid estimated probabilities equal to one.

> comp(1e3)
[1] 0.051 0.038 0.183


(and I eliminated the three other probabilities by sheer symmetry). Second, checking in Feller’s bible (Vol. 1, XIV.3) for the gambler’s ruin probability, a simple comparison of the six orderings confirms this simulation.

## hard birthday problem

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on February 4, 2021 by xi'an

Click to access birthday.pdf

From an X validated question, found that WordPress now allows for direct link to pdf documents, like the above paper by my old friend Anirban Das Gupta! The question is about estimating a number M of individuals with N distinct birth dates over a year of T days. After looking around I could not find a simpler representation of the probability for N=r other than (1) in my answer,

$\frac{T!}{(\bar N-r)!}\frac{m!}{T^m} \sum_{(r_1,\ldots,r_m);\\\sum_1^m r_i=r\ \&\\\sum_1^m ir_i=m}1\Big/\prod_{j=1}^m r_j! (j!)^{r_j}$

borrowed from a paper by Fisher et al. (Another Fisher!) Checking Feller leads to the probability (p.102)

${T \choose r}\sum_{\nu=0}^r (-1)^{\nu}{r\choose\nu}\left(1-\frac{T-r+\nu}T \right)^m$

which fits rather nicely simulation frequencies, as shown using

apply(!apply(matrix(sample(1:Nb,T*M,rep=TRUE),T,M),1,duplicated),2,sum)


Further, Feller (1970, pp.103-104) justifies an asymptotic Poisson approximation with parameter$$\lambda(M)=\bar{N}\exp\{-M/\bar N\}$ from which an estimate of$M\$ can be derived. With the birthday problem as illustration (pp.105-106)!

It may be that a completion from N to (R¹,R²,…) where the components are the number of days with one birthdate, two birthdates, &tc. could help design an EM algorithm that would remove the summation in (1) but I did not spend more time on the problem (than finding a SAS approximation to the probability!).

## the riddle(r) of the certain winner losing in the end

Posted in Books, Kids, R, Statistics with tags , , , , , on November 25, 2020 by xi'an

Considering a binary random walk, starting at zero, what is the probability of being almost sure of winning at some point only to lose at the end? This is the question set by the post-election Riddler, with almost sure meaning above 99% and the time horizon set to n=101 steps (it could have been 50 or 538!). As I could not see a simple way to compute the collection of states with a probability of being positive at the end of at least 0.99, even after checking William Feller’s Random Walks fabulous chapter, I wrote an R code to find them, and then ran a Monte Carlo evaluation of the probability to reach this collection and still end up with a negative value. Which came as 0.00212 over repeated simulations. Obviously smaller than 0.01, but no considerably so. As seen on the above picture, the set to be visited is actually not inconsiderable. The bounding curves are the diagonal and the 2.33 √(n-t) bound derived from the limiting Brownian approximation to the random walk, which fits rather well. (I wonder if there is a closed form expression for the probability of the Brownian hitting the boundary 2.33 √(n-t). Simulations with 1001 steps give an estimated probability of 0.505, leading to a final probability of 0.00505 of getting over the boundary and loosing in the end, close to the 1/198 produced by The Riddler.)

## Dorfman’s group testing

Posted in Books, Kids, Statistics, Travel with tags , , , , , , , , on April 9, 2020 by xi'an

A recent note by CREST researchers insists on using group testing to compensate for the shortage of test packages and testing personal, as done in several countries, towards deconfining individuals who are not infected. Or who are exhibiting the right antibodies.  Reminding me of my first entry to the notion, in Feller’s book, of the method implemented by Robert Dorfman to test for syphilis prior to enlisting potential WW II soldiers. I would deem the idea useful for surveys, in identifying the proportion of infected or immunised persons, maybe less for giving the green light to leave one’s house as the logistics of merging the tests while keeping track of every individual could prove impossible.