Archive for Benford’s Law

Randomness through computation

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on June 22, 2011 by xi'an

A few months ago, I received a puzzling advertising for this book, Randomness through Computation, and I eventually ordered it, despite getting a rather negative impression from reading the chapter written by Tomasso Toffoli… The book as a whole is definitely perplexing (even when correcting for this initial bias) and I would not recommend it to readers interested in simulation, in computational statistics or even in the philosophy of randomness. My overall feeling is indeed that, while there are genuinely informative and innovative chapters in this book, some chapters read more like newspeak than scientific material (mixing the Second Law of Thermodynamics, Gödel’s incompleteness theorem, quantum physics, and NP completeness within the same sentence) and do not provide a useful entry on the issue of randomness. Hence, the book is not contributing in a significant manner to my understanding of the notion. (This post also appeared on the Statistics Forum.) Continue reading

Versions of Benford’s Law

Posted in Books, Statistics with tags , , , , on May 20, 2010 by xi'an

A new arXived note by Berger and Hill discusses how [my favourite probability introduction] Feller’s Introduction to Probability Theory (volume 2) gets Benford’s Law “wrong”. While my interest in Benford’s Law is rather superficial, I find the paper of interest as it shows a confusion between different folk theorems! My interpretation of Benford’s Law is that the first significant digit of a random variable (in a basis 10 representation) is distributed as

f(i) \propto \log_{10}(1+\frac{1}{i})

and not that \log(X) \,(\text{mod}\,1) is uniform, which is the presentation given in the arXived note…. The former is also the interpretation of William Feller (page 63, Introduction to Probability Theory), contrary to what the arXived note seems to imply on page 2, but Feller indeed mentioned as an informal/heuristic argument in favour of Benford’s Law that when the spread of the rv X is large,  \log(X) is approximately uniformly distributed. (I would no call this a “fundamental flaw“.) The arXived note is then right in pointing out the lack of foundation for Feller’s heuristic, if muddling the issue by defining several non-equivalent versions of Benford’s Law. It is also funny that this arXived note picks at the scale-invariant characterisation of Benford’s Law when Terry Tao’s entry represents it as a special case of Haar measure!

More on Benford’s Law

Posted in Statistics with tags , , , , on July 10, 2009 by xi'an

In connection with an earlier post on Benford’s Law, i.e. the probability that the first digit of a random variable X is1\le k\le 9is approximately\log\{(k+1)/k\}—you can easily check that the sum of those probabilities is 1—, I want to signal a recent entry on Terry Tiao’s impressive blog. Terry points out that Benford’s Law is the Haar measure in that setting, but he also highlights a very peculiar absorbing property which is that, ifXfollows Benford’s Law, thenXYalso follows Benford’s Law for any random variableYthat is independent fromX… Now, the funny thing is that, if you take a normal samplex_1,\ldots,x_nand check whether or not Benford’s Law applies to this sample, it does not. But if you take a second normal sampley_1,\ldots,y_nand consider the product samplex_1\times y_1,\ldots,x_n\times y_n, then Benford’s Law applies almost exactly. If you repeat the process one more time, it is difficult to spot the difference. Here is the [rudimentary—there must be a more elegant way to get the first significant digit!] R code to check this:

x=abs(rnorm(10^6))
b=trunc(log10(x)) -(log(x)<0)
plot(hist(trunc(x/10^b),breaks=(0:9)+.5)$den,log10((2:10)/(1:9)),
    xlab="Frequency",ylab="Benford's Law",pch=19,col="steelblue")
abline(a=0,b=1,col="tomato",lwd=2)
x=abs(rnorm(10^6)*x)
b=trunc(log10(x)) -(log(x)<0)
points(hist(trunc(x/10^b),breaks=(0:9)+.5,plot=F)$den,log10((2:10)/(1:9)),
    pch=19,col="steelblue2")
x=abs(rnorm(10^6)*x)
b=trunc(log10(x)) -(log(x)<0)
    points(hist(trunc(x/10^b),breaks=(0:9)+.5,plot=F)$den,log10((2:10)/(1:9)),
pch=19,col="steelblue3")

Even better, if you change rnorm to another generator like rcauchy or rexp at any of the three stages, the same pattern occurs.

More/less incriminating digits from the Iranian election

Posted in Statistics with tags , , , , , , on June 21, 2009 by xi'an

Following my previous post where I commented on Roukema’s use of Benford’s Law on the first digits of the counts, I saw on Andrew Gelman’s blog a pointer to a paper in the Washington Post, where the arguments are based instead on the last digit. Those should be uniform, rather than distributed from Benford’s Law, There is no doubt about the uniformity of the last digit, but the claim for “extreme unlikeliness” of the frequencies of those digits made in the paper is not so convincing. Indeed, when I uniformly sampled 116 digits in {0,..,9}, my very first attempt produced the highest frequency to be 20.5% and the lowest to be 5.9%. If I run a small Monte Carlo experiment with the following R program,

fre=0
for (t in 1:10^4){
   h=hist(sample(0:9,116,rep=T),plot=F)$inten;
   fre=fre+(max(h)>.16)*(min(h)<.05)
   }

the percentage of cases when this happens is 15%, so this is not “extremely unlikely” (unless I made a terrible blunder in the above!!!)… Even moving the constraint to

(max(h)>.169)*(min(h)<.041)

does not produce a very unlikely probability, since it is then 0.0525.

The second argument looks at the proportion of last and second-to-last digits that are adjacent, i.e. with a difference of ±1 or ±9. Out of the 116 Iranian results, 62% are made of non-adjacent digits. If I sample two vectors of 116 digits in {0,..,9} and if I consider this occurrence, I do see an unlikely event. Running the Monte Carlo experiment

repa=NULL
for (t in 1:10^5){
    dife=(sample(0:9,116,rep=T)-sample(0:9,116,rep=T))^2
    repa[t]=sum((dife==1))+sum((dife==81))
    }
repa=repa/116

shows that the distribution of repa is centered at .20—as it should, since for a given second-to-last digit, there are two adjacent last digits—, not .30 as indicated in the paper, and that the probability of having a frequency of .38 or more of adjacent digit is estimated as zero by this Monte Carlo experiment. (Note that I took 0 and 9 to be adjacent and that removing this occurrence would further lower the probability.)

Benford’s Law satisfies Stiegler’s Law

Posted in Books, Statistics with tags , on June 18, 2009 by xi'an

Looking around for other entries on Benford’s Law, I found this nice entry that attributes Benford’s Law to the astronomer Simon Newcomb, instead of Benford (who rediscovered the distribution fifty years later). This is quite in line with Stiegler’s Law of Eponymy, which states that (almost) no scientific law is named after its original discoverer.The post of Peter Coles also covers the connection between Benford’s Law and Jeffreys’ prior for scale parameters, which is discussed in Jim Berger’s Statistical Decision Theory and Bayesian Analysis.

Follow

Get every new post delivered to your Inbox.

Join 551 other followers