## Unusual timing shows how random mass murder can be (or even less)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 29, 2013 by xi'an

This post follows the original one on the headline of the USA Today I read during my flight to Toronto last month. I remind you that the unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. Which means that the difference between the four dates is at most 3, not 4!

I asked my friend Anirban Das Gupta from Purdue University are the exact value of this probability and the first thing he pointed out was that I used a different meaning of “within 4”. He then went into an elaborate calculation to find an upper bound on this probability, upper bound that was way above my Monte Carlo approximation and my rough calculation of last post. I rechecked my R code and found it was not achieving the right approximation since one date was within 3 days of three other days, at least… I thus rewrote the following R code

T=10^6
four=rep(0,T)
for (t in 1:T){
day=sort(sample(1:365,30,rep=TRUE)) #30 random days
day=c(day,day[day>363]-365) #account for toric difference
tem=outer(day,day,"-")
four[t]=(max(apply(((tem>-1)&(tem<4)),1,sum)>3))
}
mean(four)


[checked it was ok for two dates within 1 day, resulting in the birthday problem probability] and found 0.070214, which is much larger than the earlier value and shows it takes an average 14 years for the “unlikely” event to happen! And the chances that it happens within seven years is 40%.

Another coincidence relates to this evaluation, namely the fact that two elderly couples in France committed couple suicide within three days, last week. I however could not find the figures for the number of couple suicides per year. Maybe because it is extremely rare. Or undetected…

## double-yolkers

Posted in Kids, Statistics with tags , , , , , , , on November 14, 2013 by xi'an

Last night I was cooking buckwheat pancakes (galettes de sarrasin) from Brittany with an egg-and-ham filling. The first egg I used contained a double yolk, a fairly rare occurrence, at least in my kitchen! Then came the second pancake and, unbelievably!, a second egg with a double yolk! This sounded too unbelievable to be…unbelievable! The experiment stopped there as no one else wanted another galette, but tonight, when making chocolate mousse, I checked whether or not the four remaining eggs also were double-yolkers…and indeed they were. Which does not help when separating yolks from white, by the way. Esp. with IX fingers. At some stage, during the day, I remembered a talk by Prof of Risk David Spiegelhalter mentioning the issue, even including a picture of an egg-box with the double-yolker guarantee, as in the attached picture. But all I could find first was this explanation on BBC News. Which made sense for my eggs, as those were from a large calibre egg-box (which I usually do not buy)… (And then I typed David Spiegelhalter plus ‘double-yolker” on Google and all those references came out!)

## Unusual timing shows how random mass murder can be (or not)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 4, 2013 by xi'an

This was one headline in the USA Today I picked from the hotel lobby on my way to Pittsburgh airport and then Toronto this morning. The unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. The article did not explain why this was unusual. And reported one mass murder expert’s opinion instead of a statistician’s…

Now, there are about 30 mass murders in the U.S. each year (!), so the probability of finding at least four of those 30 events within 4 days of one another should be related to von Mises‘ birthday problem. For instance, Abramson and Moser derived in 1970 that the probability that at least two people (among n) have birthday within k days of one another (for an m days year) is

$p(n,k,m) = 1 - \dfrac{(m-nk-1)!}{m^{n-1}(m-nk-n)!}$

but I did not find an extension to the case of the four (to borrow from Conan Doyle!)… A quick approximation would be to turn the problem into a birthday problem with 364/4=91 days and count the probability that four share the same birthday

${30 \choose 4} \frac{90^{26}}{91^{29}}=0.0273$

which is surprisingly large. So I checked with a R code in the plane:

T=10^5
four=rep(0,T)
for (t in 1:T){
day=sample(1:365,30,rep=TRUE)
four[t]=(max(apply((abs(outer(day,day,"-"))<4),1,sum))>4)}
mean(four)


and found 0.0278, which means the above approximation is far from terrible! I think it may actually be “exact” in the sense that observing exactly four murders within four days of one another is given by this probability. The cases of five, six, &tc. murders are omitted but they are also highly negligible. And from this number, we can see that there is a 18% probability that the case of the four occurs within seven years. Not so unlikely, then.

## the birthday problem [X’idated]

Posted in R, Statistics, University life with tags , , , on February 1, 2012 by xi'an

The birthday problem (i.e. looking at the distribution of the birthdates in a group of n persons, assuming [wrongly] a uniform distribution of the calendar dates of those birthdates) is always a source of puzzlement [for me]! For instance, here is a recent post on Cross Validated:

I have 360 friends on facebook, and, as expected, the distribution of their birthdays is not uniform at all. I have one day with that has 9 friends with the same birthday. So, given that some days are more likely for a birthday, I’m assuming the number of 23 is an upperbound.

The figure 9 sounded unlikely, so I ran the following computation:

extreme=rep(0,360)
for (t in 1:10^5){
i=max(diff((1:360)[!duplicated(sort(sample(1:365,360,rep=TRUE)))]))
extreme[i]=extreme[i]+1
}
extreme=extreme/10^5
barplot(extreme,xlim=c(0,30),names=1:360)


whose output shown on the above graph. (Actually, I must confess I first forgot the sort in the code, which led me to then believe that 9 was one of the most likely values and post it on Cross Validated! The error was eventually picked by one administrator. I should know better than trust my own R code!) According to this simulation, observing 9 or more people having the same birthdate has an approximate probability of 0.00032… Indeed, fairly unlikely!

Incidentally, this question led me to uncover how to print the above on this webpage. And to learn from the X’idated moderator whuber the use of tabulate. Which avoids the above loop:

> system.time(test(10^5)) #my code above
user  system elapsed
26.230   0.028  26.411
> system.time(table(replicate(10^5, max(tabulate(sample(1:365,360,rep=TRUE))))))
user  system elapsed
5.708   0.044   5.762


## Mr Meyrowitz’s glasses

Posted in Statistics, University life with tags , , , , , , , on October 23, 2011 by xi'an

Today, I found a site entitled Mr Meyrowitz’s Class that links to my first post on coincidences in lotteries as an example of “fatal error”. This seems to be part of a student’s assignment, apparently for the CollegeBoard programme, with 10 minutes allocated to students to find my “fatal error with decimals and probabilities”… As there is no hint, I wonder where my fatal error stands: I could not find it after those 10 minutes of intense searching and recomputing. Maybe Mr Meyrowitz actually needs new glasses to spot the difference between a 1‰ chance and a 1% chance… (Which actually misled a few other readers of the post.)

Question 6) in this assignment also sounds very much inspired from another of my posts on coincidences in lotteries [although not acknowledged in the assignment] since the question refers to the same original France Soir article in French. The question is however rather vague: “do you suspect him of cheating?” and it shows a lack of knowledge about French loto where cheating is [close to] impossible. It is certainly not recommended as an exercise for beginning students in probability or statistics. [Actually, in my opinion, the whole assignment is poor, being either imprecise, e.g question 7), useless, as for question 4) “Pick one topic that you understand very well and one that you do not understand well” (!), or plain wrong, as for question 2)…]

## Another coincidence…

Posted in Mountains, pictures, Travel with tags , , , , , , , on September 9, 2011 by xi'an

After the coincidence of bumping into Marc Suchard in an Edinburgian Indian restaurant on Tuesday night, I faced another if much less pleasant coincidental event: for the third time in a row, my bag went missing on a Air France flight to Scotland… This happened first for the mixture meeting in 2009, costing me an attempt at Tower Ridge on Ben Nevis, then again when I took part in the colloquium celebrating Mike Titterington last May. Since having three independently lost luggages on three (or six) trips is very unlikely, there must be a reason to this pattern! Besides a conspiracy theory about the airline pushing me towards other companies because of my grumpy patronage or my unsuccessful requests for Irn Bru, possible reasons are late checking-ins (even though this does not apply in the last two cases) and use of a [small] backpack that is always turned into an “oversized” piece of luggage by the airline ground personal (does not apply to the first occurrence), but I do not carry particularly suspicious items, not even haggis on the way back… There must be a better reason than that!

## another lottery coincidence

Posted in R, Statistics with tags , , , on August 30, 2011 by xi'an

Once again, meaningless figures are published about a man who won the French lottery (Le Loto) for the second time. The reported probability of the event is indeed one chance out of 363 (US) trillions (i.e., billions in the metric system. or 1012)… This number is simply the square of

${49 \choose 5}\times{10 \choose 1} = 19,068,840$

which is the number of possible loto grids. Thus, the probability applies to the event “Mr so-&-so plays a winning grid of Le Loto on May 6, 1995 and a winning grid of Le Loto on July 27, 2011“. But this is not the event that occured: one of the bi-weekly winners of Le Loto won a second time and this was spotted by Le Loto spokepersons. If we take the specific winner for today’s draw, Mrs such-&-such, who played bi-weekly one single grid since the creation of Le Loto in 1976, i.e. about 3640 times, the probability that she won earlier is of the order of

$1-\left(1-\frac{1}{{49\choose 5}\times{10\choose 1}}\right)^{3640}=2\cdot 10^{-4}$.

There are thus two chances in 10 thousands to win again for a given (unigrid) winner, not much indeed, but no billion involved either. Now, this is also the probability that, for a given draw (like today’s draw), one of the 3640 previous winners wins again (assuming they all play only one grid,  play independently from each other, &tc.). Over a given year, i.e. over 104 draws, the probability that there is no second-time winner is thus approximately

$\left(1-\frac{1}{2\cdot10^4}\right)^{104} = 0.98,$

showing that within a year there is a 2% chance to find an earlier winner. Not so extreme, isn’t it?! Therefore, less bound to make the headlines…

Now, the above are rough and conservative calculations. The newspaper articles about the double winner report that the man is playing about 1000 euros a month (this is roughly the minimum wage!), representing the equivalent of 62 grids per draw (again I am simplifying to get the correct order of magnitude). If we repeat the above computations, assuming this man has played 62 grids per draw from the beginning of the game in 1976 till now, the probability that he wins again conditional on the fact that he won once is

$1-\left(1-\frac{62}{{49 \choose 5}\times{10 \choose 1}}\right)^{3640} = 0.012$,

a small but not impossible event. (And again, we consider the probability only for Mr so-&-so, while the event of interest does not.) (I wrote this post before Alex pointed out the four-time lottery winner in Texas, whose “luck” seems more related with the imperfections of the lottery process…)

I also stumbled on this bogus site providing the “probabilities” (based on the binomial distribution, nothing less!) for each digit in Le Loto, no need for further comments. (Even the society that runs Le Loto hints at such practices, by providing the number of consecutive draws a given number has not appeared, with the sole warning “N’oubliez jamais que le hasard ne se contrôle pas“, i.e. “Always keep in mind that chance cannot be controlled“…!)