## birthday paradox, one by one

Posted in Books, Kids, pictures, Statistics with tags , , , , , on November 1, 2022 by xi'an

An easy riddle, riding the birthday paradox. Namely how many people on average have to sequentially enter a room for a double birthday to occur?

$\sum_{n=2}^{366}n\prod_{m=1}^{n-2}\left(1-\frac{m}{365}\right)\frac{n-1}{365}$

as only the last person in can share a birthday with those already in. This sum equals 24.62. Since the “magic number” is 23, this may sound wrong but the median attached to the above distribution is truly 23. My R code is

q=rep(1,a<-366)/365
for(n in 3:a)q[n]=q[n-1]*(1+1/(n-2)-(n-1)/365)
sum(q[-1]*(2:a))


## hard birthday problem

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on February 4, 2021 by xi'an

Click to access birthday.pdf

From an X validated question, found that WordPress now allows for direct link to pdf documents, like the above paper by my old friend Anirban Das Gupta! The question is about estimating a number M of individuals with N distinct birth dates over a year of T days. After looking around I could not find a simpler representation of the probability for N=r other than (1) in my answer,

$\frac{T!}{(\bar N-r)!}\frac{m!}{T^m} \sum_{(r_1,\ldots,r_m);\\\sum_1^m r_i=r\ \&\\\sum_1^m ir_i=m}1\Big/\prod_{j=1}^m r_j! (j!)^{r_j}$

borrowed from a paper by Fisher et al. (Another Fisher!) Checking Feller leads to the probability (p.102)

${T \choose r}\sum_{\nu=0}^r (-1)^{\nu}{r\choose\nu}\left(1-\frac{T-r+\nu}T \right)^m$

which fits rather nicely simulation frequencies, as shown using

apply(!apply(matrix(sample(1:Nb,T*M,rep=TRUE),T,M),1,duplicated),2,sum)


Further, Feller (1970, pp.103-104) justifies an asymptotic Poisson approximation with parameter$$\lambda(M)=\bar{N}\exp\{-M/\bar N\}$ from which an estimate of$M\$ can be derived. With the birthday problem as illustration (pp.105-106)!

It may be that a completion from N to (R¹,R²,…) where the components are the number of days with one birthdate, two birthdates, &tc. could help design an EM algorithm that would remove the summation in (1) but I did not spend more time on the problem (than finding a SAS approximation to the probability!).

## random generators produce ties

Posted in Books, R, Statistics with tags , , , , , , , on April 21, 2020 by xi'an

“…an essential part of understanding how many ties these RNGs produce is to understand how many ties one expects in 32-bit integer arithmetic.”

A sort of a birthday-problem paper for random generators by Markus Hofert on arXiv as to why they produce ties. As shown for instance in the R code (inspired by the paper):

sum(duplicated(runif(1e6)))


returning values around 100, which is indeed unexpected until one thinks a wee bit about it… With no change if moving to an alternative to the Mersenne twister generator. Indeed, assuming the R random generators produce integers with 2³² values, the expected number of ties is actually 116 for 10⁶ simulations. Moving to 2⁶⁴, the probability of a tie is negligible, around 10⁻⁸. A side remark of further inerest in the paper is that, due to a different effective gap between 0 and the smallest positive normal number, of order 10⁻²⁵⁴ and between 1 and the smallest normal number greater than 1, of order 10⁻¹⁶, “the grid of representable double numbers is not equidistant”. Justifying the need for special functions such as expm1 and log1p, corresponding to more accurate derivations of exp(x)-1 and log(1+x).

## Le Monde puzzle [#1107]

Posted in Kids with tags , , , on July 8, 2019 by xi'an

A light birthday problem as Le Monde mathematical puzzle:

Each member of a group of 35 persons writes down the number of those who share the same birth-month and the number of those who share the same birth-date [with them]. It happens that these 70 numbers include all integers from 0 to 10. Show that at least two people share a birth-day. What is the maximal number of people for this property to hold?

Which needs no R code since the result follows from the remark that the number of individuals sharing a birth-month with just one other, n¹, is a multiple of 2, the number of individuals sharing a birth-month with just two others, n², a multiple of 3, and so on. Hence, if no people share a birth-day, n¹,n²,…,n¹⁰>0 and

n¹+n²+…+n¹⁰ ≥ 2+3+…+11 = 6·11-1=65

which means that it is impossible that the 10 digits n¹,…,n¹⁰ are all positive. All the way up to 65 people. As an aside, no correction of the wrong solution to puzzle #1105 was published in the subsequent editions.

## Le Monde puzzle [#1081]

Posted in Books, Kids, R, Travel with tags , , , , on January 24, 2019 by xi'an

A “he said-she said” Le Monde mathematical puzzle (again in the spirit of the famous Singapore high-school birthdate problem):

Abigail and Corentin are both given a positive integer, a and b, such that a+b is either 19 or 20. They are asked one after the other and repeatedly if they are sure of the other’s number. What is the maximum number of times they are questioned?

If Abigail is given a 19, b=1 necessarily. Hence if Abigail does not reply, a<19. This implies that, if Corentin is given b=1 or b=19, he can reply a+b=19 or a+b=20, necessarily. Else, 1<b<19 implies that, if a=1 or a=18, b=18 or b=2. And so on…which leads to a maximum of 20 questions, 10 for Abigail and 10 for Corentin. Here is my R implementation

az=bz=cbind(20-(1:19),19-(1:19))
qwz=0;at=TRUE;bt=FALSE
while ((max(az)>0)&(max(bz)>0)){
if (at){
for (i in 1:19){
if (sum(az[i,]>0)==2){
for (j in az[i,az[i,]>0]){
if (sum(bz[j,]==0)==2) az[i,]=rep(0,2)}}
if (sum(az[i,]>0)<2){
az[i,]=rep(0,2)}}}
if (bt){
for (i in 1:19){
if (sum(bz[i,bz[i,]>0]>0)==2){
for (j in bz[i,bz[i,]>0]){
if (sum(az[j,]==0)==2) bz[i,]=rep(0,2)}}
if (sum(bz[i,]>0)<2){ bz[i,]=rep(0,2)}}}
bt=!bt;at=!at;qwz=qwz+1}