## Unusual timing shows how random mass murder can be (or even less)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 29, 2013 by xi'an

This post follows the original one on the headline of the USA Today I read during my flight to Toronto last month. I remind you that the unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. Which means that the difference between the four dates is at most 3, not 4!

I asked my friend Anirban Das Gupta from Purdue University are the exact value of this probability and the first thing he pointed out was that I used a different meaning of “within 4″. He then went into an elaborate calculation to find an upper bound on this probability, upper bound that was way above my Monte Carlo approximation and my rough calculation of last post. I rechecked my R code and found it was not achieving the right approximation since one date was within 3 days of three other days, at least… I thus rewrote the following R code

T=10^6
four=rep(0,T)
for (t in 1:T){
day=sort(sample(1:365,30,rep=TRUE)) #30 random days
day=c(day,day[day>363]-365) #account for toric difference
tem=outer(day,day,"-")
four[t]=(max(apply(((tem>-1)&(tem<4)),1,sum)>3))
}
mean(four)


[checked it was ok for two dates within 1 day, resulting in the birthday problem probability] and found 0.070214, which is much larger than the earlier value and shows it takes an average 14 years for the “unlikely” event to happen! And the chances that it happens within seven years is 40%.

Another coincidence relates to this evaluation, namely the fact that two elderly couples in France committed couple suicide within three days, last week. I however could not find the figures for the number of couple suicides per year. Maybe because it is extremely rare. Or undetected…

## double-yolkers

Posted in Kids, Statistics with tags , , , , , , , on November 14, 2013 by xi'an

Last night I was cooking buckwheat pancakes (galettes de sarrasin) from Brittany with an egg-and-ham filling. The first egg I used contained a double yolk, a fairly rare occurrence, at least in my kitchen! Then came the second pancake and, unbelievably!, a second egg with a double yolk! This sounded too unbelievable to be…unbelievable! The experiment stopped there as no one else wanted another galette, but tonight, when making chocolate mousse, I checked whether or not the four remaining eggs also were double-yolkers…and indeed they were. Which does not help when separating yolks from white, by the way. Esp. with IX fingers. At some stage, during the day, I remembered a talk by Prof of Risk David Spiegelhalter mentioning the issue, even including a picture of an egg-box with the double-yolker guarantee, as in the attached picture. But all I could find first was this explanation on BBC News. Which made sense for my eggs, as those were from a large calibre egg-box (which I usually do not buy)… (And then I typed David Spiegelhalter plus ‘double-yolker” on Google and all those references came out!)

## Unusual timing shows how random mass murder can be (or not)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 4, 2013 by xi'an

This was one headline in the USA Today I picked from the hotel lobby on my way to Pittsburgh airport and then Toronto this morning. The unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. The article did not explain why this was unusual. And reported one mass murder expert’s opinion instead of a statistician’s…

Now, there are about 30 mass murders in the U.S. each year (!), so the probability of finding at least four of those 30 events within 4 days of one another should be related to von Mises‘ birthday problem. For instance, Abramson and Moser derived in 1970 that the probability that at least two people (among n) have birthday within k days of one another (for an m days year) is

$p(n,k,m) = 1 - \dfrac{(m-nk-1)!}{m^{n-1}(m-nk-n)!}$

but I did not find an extension to the case of the four (to borrow from Conan Doyle!)… A quick approximation would be to turn the problem into a birthday problem with 364/4=91 days and count the probability that four share the same birthday

${30 \choose 4} \frac{90^{26}}{91^{29}}=0.0273$

which is surprisingly large. So I checked with a R code in the plane:

T=10^5
four=rep(0,T)
for (t in 1:T){
day=sample(1:365,30,rep=TRUE)
four[t]=(max(apply((abs(outer(day,day,"-"))<4),1,sum))>4)}
mean(four)


and found 0.0278, which means the above approximation is far from terrible! I think it may actually be “exact” in the sense that observing exactly four murders within four days of one another is given by this probability. The cases of five, six, &tc. murders are omitted but they are also highly negligible. And from this number, we can see that there is a 18% probability that the case of the four occurs within seven years. Not so unlikely, then.

## the birthday problem [X'idated]

Posted in R, Statistics, University life with tags , , , on February 1, 2012 by xi'an

The birthday problem (i.e. looking at the distribution of the birthdates in a group of n persons, assuming [wrongly] a uniform distribution of the calendar dates of those birthdates) is always a source of puzzlement [for me]! For instance, here is a recent post on Cross Validated:

I have 360 friends on facebook, and, as expected, the distribution of their birthdays is not uniform at all. I have one day with that has 9 friends with the same birthday. So, given that some days are more likely for a birthday, I’m assuming the number of 23 is an upperbound.

The figure 9 sounded unlikely, so I ran the following computation:

extreme=rep(0,360)
for (t in 1:10^5){
i=max(diff((1:360)[!duplicated(sort(sample(1:365,360,rep=TRUE)))]))
extreme[i]=extreme[i]+1
}
extreme=extreme/10^5
barplot(extreme,xlim=c(0,30),names=1:360)


whose output shown on the above graph. (Actually, I must confess I first forgot the sort in the code, which led me to then believe that 9 was one of the most likely values and post it on Cross Validated! The error was eventually picked by one administrator. I should know better than trust my own R code!) According to this simulation, observing 9 or more people having the same birthdate has an approximate probability of 0.00032… Indeed, fairly unlikely!

Incidentally, this question led me to uncover how to print the above on this webpage. And to learn from the X’idated moderator whuber the use of tabulate. Which avoids the above loop:

> system.time(test(10^5)) #my code above
user  system elapsed
26.230   0.028  26.411
> system.time(table(replicate(10^5, max(tabulate(sample(1:365,360,rep=TRUE))))))
user  system elapsed
5.708   0.044   5.762


## Mr Meyrowitz’s glasses

Posted in Statistics, University life with tags , , , , , , , on October 23, 2011 by xi'an

Today, I found a site entitled Mr Meyrowitz’s Class that links to my first post on coincidences in lotteries as an example of “fatal error”. This seems to be part of a student’s assignment, apparently for the CollegeBoard programme, with 10 minutes allocated to students to find my “fatal error with decimals and probabilities”… As there is no hint, I wonder where my fatal error stands: I could not find it after those 10 minutes of intense searching and recomputing. Maybe Mr Meyrowitz actually needs new glasses to spot the difference between a 1‰ chance and a 1% chance… (Which actually misled a few other readers of the post.)

Question 6) in this assignment also sounds very much inspired from another of my posts on coincidences in lotteries [although not acknowledged in the assignment] since the question refers to the same original France Soir article in French. The question is however rather vague: “do you suspect him of cheating?” and it shows a lack of knowledge about French loto where cheating is [close to] impossible. It is certainly not recommended as an exercise for beginning students in probability or statistics. [Actually, in my opinion, the whole assignment is poor, being either imprecise, e.g question 7), useless, as for question 4) "Pick one topic that you understand very well and one that you do not understand well" (!), or plain wrong, as for question 2)...]