## biased sample!

**A** chance occurrence led me to this thread on R-devel about R sample function generating a bias by taking the integer part of the continuous uniform generator… And then to the note by Kellie Ottoboni and Philip Stark analysing the reason, namely the fact that R uniform [0,1) pseudo-random generator is not perfectly continuously uniform but discrete, by the nature of numbers on a computer. Knuth (1997) showed that in this case the range of probabilities is larger than (1,1), the largest range being (1,1.03). As noted in the note, exploiting directly the pseudo-random bits of the pseudo-random generator. Shocking, isn’t it! A fast and bias-free alternative suggested by Lemire is available as `dqsample::sample`

May 21, 2019 at 4:18 am

I believe the new version of R (which is version 3.6.0 released on 26th April) may have addressed this issue, since one of its new features is:

“The default method for generating from a discrete uniform distribution (used in sample(), for instance) has been changed. This addresses the fact, pointed out by Ottoboni and Stark, that the previous method made sample() noticeably non-uniform on large populations.”

If so, it is good to know that this defect has (finally!) been taken into account.

May 21, 2019 at 8:25 am

Ah great news!

May 21, 2019 at 11:29 am

The relevant entry (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494) in R’ Bugzilla explicitly mentions the thread quoted in Xian’s blog entry as a discussion of the problem fixed in 3.6.0. The NEWS entry also mentions that the previous algorithm can be selected if backward compatibility/reproducibility is an issue.

So, all in all, good news…