## More/less incriminating digits from the Iranian election

**F**ollowing my previous post where I commented on Roukema’s use of Benford’s Law on the *first* digits of the counts, I saw on Andrew Gelman’s blog a pointer to a paper in the ** Washington Post**, where the arguments are based instead on the

*last*digit. Those should be uniform, rather than distributed from Benford’s Law, There is no doubt about the uniformity of the last digit, but the claim for “extreme unlikeliness” of the frequencies of those digits made in the paper is not so convincing. Indeed, when I uniformly sampled 116 digits in {0,..,9}, my very first attempt produced the highest frequency to be 20.5% and the lowest to be 5.9%. If I run a small Monte Carlo experiment with the following R program,

fre=0 for (t in 1:10^4){ h=hist(sample(0:9,116,rep=T),plot=F)$inten; fre=fre+(max(h)>.16)*(min(h)<.05) }

the percentage of cases when this happens is 15%, so this is not “extremely unlikely” (unless I made a terrible blunder in the above!!!)… Even moving the constraint to

(max(h)>.169)*(min(h)<.041)

does not produce a very unlikely probability, since it is then 0.0525.

**T**he second argument looks at the proportion of *last and second-to-last* digits that are adjacent, i.e. with a difference of ±1 or ±9. Out of the 116 Iranian results, 62% are made of non-adjacent digits. If I sample two vectors of 116 digits in {0,..,9} and if I consider this occurrence, I do see an unlikely event. Running the Monte Carlo experiment

repa=NULL for (t in 1:10^5){ dife=(sample(0:9,116,rep=T)-sample(0:9,116,rep=T))^2 repa[t]=sum((dife==1))+sum((dife==81)) } repa=repa/116

shows that the distribution of **repa** is centered at .20—as it should, since for a given *second-to-last* digit, there are two adjacent *last* digits—, not .30 as indicated in the paper, and that the probability of having a frequency of .38 or more of adjacent digit is estimated as zero by this Monte Carlo experiment. (Note that I took 0 and 9 to be adjacent and that removing this occurrence would further lower the probability.)

October 8, 2009 at 3:41 pm

My post got garbled. It should of course read

for (t in 1:run)

fre (.169*k))

*(min(out[,t])<(.041*k))

I typically get about 6.2% to your 15% and 1.4% to your 5.25%.

Sorry my French is not good enough so I had to post in English!

October 7, 2009 at 10:42 pm

I am tardy in reading about the analyses of the Iranian election. Belated thanks for your posts and the links you provided.

Your simulation results are too high, I think, due to the hist function binning 0 and 1 together by default. For (max(h)>.16)*(min(h).169)*(min(h)<.041) I got around 1.4% by sampling from a multinomial:

run <- 1e5

k <- 116

out <- rmultinom(n=run,size=k,prob=rep(1,10))

fre <- 0

for (t in 1:run)

fre (.169*k))*(min(out[,t])<(.041*k))

100*fre/run

October 8, 2009 at 7:38 am

Thanks, will look at it! Xi’an

June 22, 2009 at 6:20 pm

[…] Apparently what started this off was a post on the ArXiv by the cosmologist Boudewijn Roukema, but I first heard about it myself via a pingback from another wordpress blog. The same blogger has written a subsequent analysis here. […]