## a closed-form but intractable birthday

Posted in Books, Kids, pictures, Statistics with tags , , , , , on November 7, 2016 by xi'an

An interesting [at least for me!] question found on X Validated yesterday, namely how to simulate efficiently from the generalised birthday problem (or paradox) distribution, which provides the probability of finding exactly k different birthday dates as

$\mathbb{P}(V = k) = \binom{n}{k}\displaystyle\sum_{i=0}^k (-1)^i \binom{k}{i} \left(\frac{k-i}{n}\right)^m$

where m is the number of individuals with a random birthday and n the number of days (e.g., n=365). The paradox with this closed-form formula (found by the inclusion-exclusion rule) is that it is too unstable to use per se. While it is always possible to run m draws from a uniform over {1,…,n} and count the number of different values, e.g.,

x=length(unique(sample(1:n,m,rep=TRUE)))


this takes much more time than using the exact distribution, if available:

sample(1:m,1e6,rep=TRUE,prob=eff[-1])


I played a little bit with the notion of using an unbiased estimator of the said probability, but the alternating series means that the unbiased estimator may end up being negative, which is an issue met in recent related papers like the famous Russian Roulette.