a simpler (?) birthday problem

Posted in Books, Kids, Statistics with tags , , , , , , , on April 9, 2022 by xi'an

A monthly birthday problem from the Riddler:

What was the probability that none of the 40 people had birthdays this month? What is the probability that there is at least one month in the year during which none of the 40 people had birthdays (not necessarily this month)?

Assuming the same number of days in all months, the probability that one individual is not born in March is 1/12 and hence the probability that none of 40 (independent!) persons are not born in March is (11/12)⁴⁰, about 3%. The second question can be solved by reading Feller’s chapter on the combination of events (1970, Chapter IV, p.102). The probability that all months are seeing at least one birthday is

$\sum_{i=0}^{12} (-1)^i {12\choose i}(1-i/12)^{40}=0.6732162$

which can be checked by a quick R simulation. The complement 0.326 is thus close to 11 x 0.03!

[un]solved riddles

Posted in Books, Kids, R with tags , , on July 4, 2017 by xi'an

On the Riddler of last week, first a birthday puzzle:

Given a group of 23 persons, what is the probability of observing three pairs of identical birthdays?

which can be found by a quick simulation as

ave=0
for (t in 1:1e6){
dupz=dates[duplicated(sample(1:365,23,rep=TRUE))]
ave=ave+as.integer((length(dupz)==3)&
(length(unique(dupz))==3))}}
ave/M


returning a value of 0.0183, but which combinatoric resolution I could not fully fathom without a little help from a friend (-ly blog). I had the multinomial coefficient

${23\choose 2\ 2\ 2}$

for the allocation of the 23 persons to one of the three pairs or none, as well as the probability

$\dfrac{365\times\cdots\times(365-(23-4)+1)}{365^{23}}$

but I had forgotten the 3! in the denominator for the permutations of the three pairs, which leads again to

${23 \choose 2\ 2\ 2}\dfrac{365\times\cdots\times(365-(23-4)+1)}{3!\times365^{23}} = 0.0183$

A question that also led to an unsolved question: in the even this probability was much smaller, what is an easy way into constructing a more efficient importance sampler?

The second riddle was just as easy to code in R:

A game of tag goes by the following rules: (i) anyone untagged can tag anyone untagged; (ii) anyone tagged by a player tagged gets untagged; (iii) the winner is the last untagged player. What is the expected number of runs for N players?

The outcome of

game=function(N=12){
a=rep(0,N);T=0
while (sum(a==0)>1){
ij=sample((1:N)[a==0],2)
a[ij[2]]=ij[1];a[a==ij[2]]=0
T=T+1}
return(T)}


leads to an average value of

$2^{N-1}-1$

but I had no clear quick explanation for the doubling phenomenon. Until I picked a pen and a sheet of paper and drew the last steps of the game: to decrease down to 1, the size of the untagged survivors has to get through …,3,2 and each time the eliminated player needs to have tagged no other player since otherwise the population grows again. This has to apply all the way to the second round, where N-1 players remain and the one tagged needs to be anyone but the one who tagged the first one. And so on…