the birthday problem [X'idated]

Posted in R, Statistics, University life with tags , , , on February 1, 2012 by xi'an

The birthday problem (i.e. looking at the distribution of the birthdates in a group of n persons, assuming [wrongly] a uniform distribution of the calendar dates of those birthdates) is always a source of puzzlement [for me]! For instance, here is a recent post on Cross Validated:

I have 360 friends on facebook, and, as expected, the distribution of their birthdays is not uniform at all. I have one day with that has 9 friends with the same birthday. So, given that some days are more likely for a birthday, I’m assuming the number of 23 is an upperbound.

The figure 9 sounded unlikely, so I ran the following computation:

extreme=rep(0,360)
for (t in 1:10^5){
i=max(diff((1:360)[!duplicated(sort(sample(1:365,360,rep=TRUE)))]))
extreme[i]=extreme[i]+1
}
extreme=extreme/10^5
barplot(extreme,xlim=c(0,30),names=1:360)


whose output shown on the above graph. (Actually, I must confess I first forgot the sort in the code, which led me to then believe that 9 was one of the most likely values and post it on Cross Validated! The error was eventually picked by one administrator. I should know better than trust my own R code!) According to this simulation, observing 9 or more people having the same birthdate has an approximate probability of 0.00032… Indeed, fairly unlikely!

Incidentally, this question led me to uncover how to print the above on this webpage. And to learn from the X’idated moderator whuber the use of tabulate. Which avoids the above loop:

> system.time(test(10^5)) #my code above
user  system elapsed
26.230   0.028  26.411
> system.time(table(replicate(10^5, max(tabulate(sample(1:365,360,rep=TRUE))))))
user  system elapsed
5.708   0.044   5.762


Mr Meyrowitz’s glasses

Posted in Statistics, University life with tags , , , , , , , on October 23, 2011 by xi'an

Today, I found a site entitled Mr Meyrowitz’s Class that links to my first post on coincidences in lotteries as an example of “fatal error”. This seems to be part of a student’s assignment, apparently for the CollegeBoard programme, with 10 minutes allocated to students to find my “fatal error with decimals and probabilities”… As there is no hint, I wonder where my fatal error stands: I could not find it after those 10 minutes of intense searching and recomputing. Maybe Mr Meyrowitz actually needs new glasses to spot the difference between a 1‰ chance and a 1% chance… (Which actually misled a few other readers of the post.)

Question 6) in this assignment also sounds very much inspired from another of my posts on coincidences in lotteries [although not acknowledged in the assignment] since the question refers to the same original France Soir article in French. The question is however rather vague: “do you suspect him of cheating?” and it shows a lack of knowledge about French loto where cheating is [close to] impossible. It is certainly not recommended as an exercise for beginning students in probability or statistics. [Actually, in my opinion, the whole assignment is poor, being either imprecise, e.g question 7), useless, as for question 4) "Pick one topic that you understand very well and one that you do not understand well" (!), or plain wrong, as for question 2)...]

Another coincidence…

Posted in Mountains, pictures, Travel with tags , , , , , , , on September 9, 2011 by xi'an

After the coincidence of bumping into Marc Suchard in an Edinburgian Indian restaurant on Tuesday night, I faced another if much less pleasant coincidental event: for the third time in a row, my bag went missing on a Air France flight to Scotland… This happened first for the mixture meeting in 2009, costing me an attempt at Tower Ridge on Ben Nevis, then again when I took part in the colloquium celebrating Mike Titterington last May. Since having three independently lost luggages on three (or six) trips is very unlikely, there must be a reason to this pattern! Besides a conspiracy theory about the airline pushing me towards other companies because of my grumpy patronage or my unsuccessful requests for Irn Bru, possible reasons are late checking-ins (even though this does not apply in the last two cases) and use of a [small] backpack that is always turned into an “oversized” piece of luggage by the airline ground personal (does not apply to the first occurrence), but I do not carry particularly suspicious items, not even haggis on the way back… There must be a better reason than that!

another lottery coincidence

Posted in R, Statistics with tags , , , on August 30, 2011 by xi'an

Once again, meaningless figures are published about a man who won the French lottery (Le Loto) for the second time. The reported probability of the event is indeed one chance out of 363 (US) trillions (i.e., billions in the metric system. or 1012)… This number is simply the square of

${49 \choose 5}\times{10 \choose 1} = 19,068,840$

which is the number of possible loto grids. Thus, the probability applies to the event “Mr so-&-so plays a winning grid of Le Loto on May 6, 1995 and a winning grid of Le Loto on July 27, 2011“. But this is not the event that occured: one of the bi-weekly winners of Le Loto won a second time and this was spotted by Le Loto spokepersons. If we take the specific winner for today’s draw, Mrs such-&-such, who played bi-weekly one single grid since the creation of Le Loto in 1976, i.e. about 3640 times, the probability that she won earlier is of the order of

$1-\left(1-\frac{1}{{49\choose 5}\times{10\choose 1}}\right)^{3640}=2\cdot 10^{-4}$.

There are thus two chances in 10 thousands to win again for a given (unigrid) winner, not much indeed, but no billion involved either. Now, this is also the probability that, for a given draw (like today’s draw), one of the 3640 previous winners wins again (assuming they all play only one grid,  play independently from each other, &tc.). Over a given year, i.e. over 104 draws, the probability that there is no second-time winner is thus approximately

$\left(1-\frac{1}{2\cdot10^4}\right)^{104} = 0.98,$

showing that within a year there is a 2% chance to find an earlier winner. Not so extreme, isn’t it?! Therefore, less bound to make the headlines…

Now, the above are rough and conservative calculations. The newspaper articles about the double winner report that the man is playing about 1000 euros a month (this is roughly the minimum wage!), representing the equivalent of 62 grids per draw (again I am simplifying to get the correct order of magnitude). If we repeat the above computations, assuming this man has played 62 grids per draw from the beginning of the game in 1976 till now, the probability that he wins again conditional on the fact that he won once is

$1-\left(1-\frac{62}{{49 \choose 5}\times{10 \choose 1}}\right)^{3640} = 0.012$,

a small but not impossible event. (And again, we consider the probability only for Mr so-&-so, while the event of interest does not.) (I wrote this post before Alex pointed out the four-time lottery winner in Texas, whose “luck” seems more related with the imperfections of the lottery process…)

I also stumbled on this bogus site providing the “probabilities” (based on the binomial distribution, nothing less!) for each digit in Le Loto, no need for further comments. (Even the society that runs Le Loto hints at such practices, by providing the number of consecutive draws a given number has not appeared, with the sole warning “N’oubliez jamais que le hasard ne se contrôle pas“, i.e. “Always keep in mind that chance cannot be controlled“…!)

Coincidence in lotteries

Posted in R, Statistics, University life with tags , , , on October 20, 2010 by xi'an

Last weekend, my friend and coauthor Jean-Michel Marin was interviewed (as Jean-Claude Marin, sic!) by a national radio about the probability of the replication of a draw on the Israeli Lottery. Twice the same series of numbers appeared within a month. This lotery operates on a principle of 6/37 + 1/8: 6 numbers are drawn out of a pool of numbers from 1 to 37 and then an 7th number is drawn between 1 and 8. The number of possibilities is therefore

${37\choose 6}\times 8=18,598,272$

and the probability of replicating, on a given day, the draws from another given day is 1/18,598,272. Now, the event picked up by the radio does not have this probability, because the news selected this occurrence out of all the lottery draws across all countries, etc. If we only consider the Israeli Lottery, there are two draws per week, meaning that over a year the probability of no coincidence is

$\dfrac{18,598,272\times 18,598,271\times\cdots\times 18,598,168}{18,598,272^{104}}=0.9997$

namely that a coincidence occurs within one year for this particular lotery with probability 3/10,000. If we start from the early 2009 when this formula of the lotery was started, there are about 188 draws and the probability of no coincidence goes down to

$\dfrac{18,598,272\times 18,598,271\times\cdots\times 18,598,084}{18,598,272^{188}}=0.999$

which means there is more than a 1‰ chance of seeing twice the same outcome. Not that unlikely despite some contradictory computations! It further appears that only the six digits were duplicated, which reduces the number of possibilities to

${37\choose 6}=2,324,784$

Over a month and eight draws, the probability of no coincidence is

$\dfrac{2,324,784\times 2,324,783\times\cdots\times 2,324,776}{2,324,784^{8}}=0.99999,$

which is indeed very small. However, if we start from the early 2009, the probability of no coincidence goes down to 0.992, which means there is close to an 8‰ chance of seeing twice the same outcome since the creation of this lottery… If we further consider that there are hundreds of similar lotteries across the World, the probability that this coincidence [of two identical draws over 188 draws] occurred in at least one out of 100 lotteries is 53%!

Last weekend, my friend and coauthor Jean-Michel Marin was interviewed (as Jean-Claude Marin, sic!) by a national radio about the probability of the replication of a draw on the Israeli Lotery. Twice the same series of numbers appeared within a month. This lotery operates on a principle of 6/37 + 1/8: 6 numbers are drawn out of a pool of numbers from 1 to 37 and then an 7th number is drawn between 1 and 8. The number of possibilities is therefore

$\choose{37}{6}\times 10=18,598,272$

and the probability of replicating, on a given day, the draws from another given day is 1/18,598,272. Now, the event picked up by the radio does not have this probability, because the news selected this occurrence out of all the lotery draws across all countries, etc. If we only consider the Israeli Lotery, there are two draws per week, meaning that over a year the probability of no coincidence is

$\dfrac{18,598,272\times 18,598,271\times\cdots\times 18,598,168}{18,598,272^{104}}=0.9997065$

namely that a coincidence occurs within one year for this particular lotery with probability 3/1000. If we start from the early 2009 when this formula of the lotery was started, there are 655 days and the