Ok, thanks! You are right, if the largest number of common birthdates is the last birthdate of the series, the diff will not pick it up. This creates a small downward bias, mostly negligible. Now, I think the correction of the bug is to add 361 to the list of indices:

max(diff(c((1:360)[!duplicated(sort(sample(1:365,360,rep=TRUE)))],361)))]]>

Let’s say that:

> x<-sort(sample(1:365,360,rep=TRUE))

gives us sth like

(1, 2, 3, ...,363,363,...,363)

where the last 10 values is 363.

That means that the !duplicated(x) in the last 10 positions will be

T, F, F, …,F (one T and nine F's) where T is True and F is False.

In that case the last value of diff((1:360)[!duplicated(x)]) will not be 10 (as it should) because the last difference it will compute will be from the T of the people who were born on the 363rd day of the year and the previous T.

the number of your friends having their birthday on one particular day in the year follows approximately a P(lambda) distribution, where lambda = 360/365; the probability that this number is 9 is approximately p9=exp(-l) l^9/9!, so the probability that none of the 365 days it 9 friends’ birthday is approximately (1-p9)^365 \approx 1-365p9 (and hardly different for 9 or more).

Hence, the probability your are looking for is approximately 365p9 = 0.00033, quite close to your Monte-Carlo estimate.

Note that all these approximations can be justified (and quantified) using Chen-Stein’s bound for Poisson approximation (which does not require independence).

]]>Played around a bit in R and figured it out earlier, but thanks a lot for explaining it.

]]>Well said, Bill! And total respect for your awesome contribution to Cross Validated!

]]>What is not uniformly distributed is the number of subjects in a group of n persons sharing the same month and day of birth.

The same happens with a series of coin tosses actually:

table(replicate(10^5, max(tabulate(sample(1:2,100,rep=TRUE))))) ]]>