More/less incriminating digits from the Iranian election
Following my previous post where I commented on Roukema’s use of Benford’s Law on the first digits of the counts, I saw on Andrew Gelman’s blog a pointer to a paper in the Washington Post, where the arguments are based instead on the last digit. Those should be uniform, rather than distributed from Benford’s Law, There is no doubt about the uniformity of the last digit, but the claim for “extreme unlikeliness” of the frequencies of those digits made in the paper is not so convincing. Indeed, when I uniformly sampled 116 digits in {0,..,9}, my very first attempt produced the highest frequency to be 20.5% and the lowest to be 5.9%. If I run a small Monte Carlo experiment with the following R program,
fre=0
for (t in 1:10^4){
h=hist(sample(0:9,116,rep=T),plot=F)$inten;
fre=fre+(max(h)>.16)*(min(h)<.05)
}
the percentage of cases when this happens is 15%, so this is not “extremely unlikely” (unless I made a terrible blunder in the above!!!)… Even moving the constraint to
(max(h)>.169)*(min(h)<.041)
does not produce a very unlikely probability, since it is then 0.0525.
The second argument looks at the proportion of last and second-to-last digits that are adjacent, i.e. with a difference of ±1 or ±9. Out of the 116 Iranian results, 62% are made of non-adjacent digits. If I sample two vectors of 116 digits in {0,..,9} and if I consider this occurrence, I do see an unlikely event. Running the Monte Carlo experiment
repa=NULL
for (t in 1:10^5){
dife=(sample(0:9,116,rep=T)-sample(0:9,116,rep=T))^2
repa[t]=sum((dife==1))+sum((dife==81))
}
repa=repa/116
shows that the distribution of repa is centered at .20—as it should, since for a given second-to-last digit, there are two adjacent last digits—, not .30 as indicated in the paper, and that the probability of having a frequency of .38 or more of adjacent digit is estimated as zero by this Monte Carlo experiment. (Note that I took 0 and 9 to be adjacent and that removing this occurrence would further lower the probability.)
October 8, 2009 at 3:41 pm
My post got garbled. It should of course read
for (t in 1:run)
fre (.169*k))
*(min(out[,t])<(.041*k))
I typically get about 6.2% to your 15% and 1.4% to your 5.25%.
Sorry my French is not good enough so I had to post in English!
October 7, 2009 at 10:42 pm
I am tardy in reading about the analyses of the Iranian election. Belated thanks for your posts and the links you provided.
Your simulation results are too high, I think, due to the hist function binning 0 and 1 together by default. For (max(h)>.16)*(min(h).169)*(min(h)<.041) I got around 1.4% by sampling from a multinomial:
run <- 1e5
k <- 116
out <- rmultinom(n=run,size=k,prob=rep(1,10))
fre <- 0
for (t in 1:run)
fre (.169*k))*(min(out[,t])<(.041*k))
100*fre/run
October 8, 2009 at 7:38 am
Thanks, will look at it! Xi’an
June 22, 2009 at 6:20 pm
[…] Apparently what started this off was a post on the ArXiv by the cosmologist Boudewijn Roukema, but I first heard about it myself via a pingback from another wordpress blog. The same blogger has written a subsequent analysis here. […]