a closed-form but intractable birthday

An interesting [at least for me!] question found on X Validated yesterday, namely how to simulate efficiently from the generalised birthday problem (or paradox) distribution, which provides the probability of finding exactly k different birthday dates as

\mathbb{P}(V = k) = \binom{n}{k}\displaystyle\sum_{i=0}^k (-1)^i \binom{k}{i}  \left(\frac{k-i}{n}\right)^m

where m is the number of individuals with a random birthday and n the number of days (e.g., n=365). The paradox with this closed-form formula (found by the inclusion-exclusion rule) is that it is too unstable to use per se. While it is always possible to run m draws from a uniform over {1,…,n} and count the number of different values, e.g.,

x=length(unique(sample(1:n,m,rep=TRUE)))

this takes much more time than using the exact distribution, if available:

sample(1:m,1e6,rep=TRUE,prob=eff[-1])

I played a little bit with the notion of using an unbiased estimator of the said probability, but the alternating series means that the unbiased estimator may end up being negative, which is an issue met in recent related papers like the famous Russian Roulette.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s