biased sample!

A chance occurrence led me to this thread on R-devel about R sample function generating a bias by taking the integer part of the continuous uniform generator… And then to the note by Kellie Ottoboni and Philip Stark analysing the reason, namely the fact that R uniform [0,1) pseudo-random generator is not perfectly continuously uniform but discrete, by the nature of numbers on a computer. Knuth (1997) showed that in this case the range of probabilities is larger than (1,1), the largest range being (1,1.03). As noted in the note, exploiting directly the pseudo-random bits of the pseudo-random generator. Shocking, isn’t it! A fast and bias-free alternative suggested by Lemire is available as dqsample::sample

As an update of June 2019, sample is now fixed.

This entry was posted on May 21, 2019 at 12:19 am and is filed under Statistics with tags bias, Donald Knuth, dqsample, integers, PRNG, pseudo-random generator, R, random bit, rounding, sample, The Art of Computer Programming, uniform distribution. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “biased sample!”

CRAN does not validate R packages! | R-bloggers Says:
July 11, 2019 at 8:11 am

[…] should be able to detect any defect pretty fast, although awareness of the incredible failure of sample() reported in an earlier post took a while to […]

Reply
The naked statistician Says:
May 21, 2019 at 4:18 am

I believe the new version of R (which is version 3.6.0 released on 26th April) may have addressed this issue, since one of its new features is:

“The default method for generating from a discrete uniform distribution (used in sample(), for instance) has been changed. This addresses the fact, pointed out by Ottoboni and Stark, that the previous method made sample() noticeably non-uniform on large populations.”

If so, it is good to know that this defect has (finally!) been taken into account.

Reply
- xi'an Says:
  May 21, 2019 at 8:25 am
  
  Ah great news!
  
  Reply
- Emmanuel Charpentier Says:
  May 21, 2019 at 11:29 am
  
  The relevant entry (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494) in R’ Bugzilla explicitly mentions the thread quoted in Xian’s blog entry as a discussion of the problem fixed in 3.6.0. The NEWS entry also mentions that the previous algorithm can be selected if backward compatibility/reproducibility is an issue.
  
  So, all in all, good news…
  
  Reply

Xi'an's Og

biased sample!

4 Responses to “biased sample!”

Leave a comment Cancel reply

Xi'an's Og

biased sample!

Share:

Related

4 Responses to “biased sample!”

Leave a comment Cancel reply