Archive for FiveThirtyEight

riddle of the week

Posted in R with tags , , , , , on April 21, 2022 by xi'an

The Riddler of April 1 offered this simple question:

start with the number 1 and then try to reach a target number through a series of steps. For each step, you can always choose to double the number you currently have. However, if the number happens to be one (1) more than an odd multiple of 3, you can choose to “reduce” — that is, subtract 1 and then divide by 3. What is the smallest positive integer one cannot reach this way?

Which I turned into R steps (while waiting for flight AF19 to Paris)

  while((!(x-1)%%3)&((x-1)%%6)){
    oor[2*x]TRUE
    oor[x<-(x-1)%/%3]=TRUE}

but running an exhaustive search till 10⁸ did not spot any missing integer… Maybe an April fool joke (as the quick riddle was asking for the simplest representation of (x-a)(x-b)…(x-z)…!)

a simpler (?) birthday problem

Posted in Books, Kids, Statistics with tags , , , , , , , on April 9, 2022 by xi'an

A monthly birthday problem from the Riddler:

What was the probability that none of the 40 people had birthdays this month? What is the probability that there is at least one month in the year during which none of the 40 people had birthdays (not necessarily this month)?

Assuming the same number of days in all months, the probability that one individual is not born in March is 1/12 and hence the probability that none of 40 (independent!) persons are not born in March is (11/12)⁴⁰, about 3%. The second question can be solved by reading Feller’s chapter on the combination of events (1970, Chapter IV, p.102). The probability that all months are seeing at least one birthday is

\sum_{i=0}^{12} (-1)^i {12\choose i}(1-i/12)^{40}=0.6732162

which can be checked by a quick R simulation. The complement 0.326 is thus close to 11 x 0.03!

distracting redistricting?

Posted in Books, Statistics with tags , , , , , , , , , on August 26, 2021 by xi'an

“We at FiveThirtyEight will be tracking the whole redistricting process, from proposed maps to final maps, so watch this space for updates!”

FiveThirtyEight is keeping a tracker on the “redistricting” of U.S. states, namely the decennial redrawing of electoral districts. This is still an early stage when no map has been validated by the state legislature and hence I cannot tell whether or not FiveThirtyEight will be analysing gerrymandering in a statistical manner, to figure out how extreme the map is within the collection of all electoral maps. The States being the States, the rules vary widely between them, from the legislators themselves setting the boundaries (while sometimes being very open on their intentions to favour their own side) to independent commissions being in charge. I did not spot any clear involvement of statisticians in the process.

“The application of differential privacy will bring significant harm to Alabama (…) The Census Bureau has not shown that other disclosure avoidance methods
would not satisfy the privacy requirements
” Case No. 3:21-cv-00211

While looking at this highly informative webpage maintained by University of Colorado Law School Doug Spencer, I came across this federal court challenge by the State of Alabama again the Census Bureau for using differential privacy! A statistical version of “shoot the messenger”?! The legal argument of the State is “the Fifth Amendment, alleging that differential privacy is a violation of the one-person, one-vote principle and will result in the dilution of their votes.” I however wonder what is the genuine (political) reason for this challenge!

top of the top

Posted in Statistics with tags , , , on August 19, 2021 by xi'an

An easy-peasy riddle from The Riddler about the probability that a random variable is the largest among ten iid variates, conditional on the event that this random variable is larger than the upper decile. This writes down easily as

10\int_{q_{90}}^\infty F^9(x) f(x)\,\text d x

if F and f are the cdf and pmf, respectively, which is equal to 1-.9¹⁰, approximately 1-e⁻¹, no matter what F is….

multinomial but unique

Posted in Kids, R, Statistics with tags , , , , , , on July 16, 2021 by xi'an

A quick riddle from the Riddler, where the multinomial M(n¹,n²,100-n¹-n²) probability of getting three different labels out of three possible ones out of three draws is 20%, inducing a single possible value for (n¹,n²) up to a permutation.

Since this probability is n¹n²(100-n¹-n²)/161,700, there indeed happens to be only one decomposition of 32,340 as 21 x 35 x 44. The number of possible values for the probability is actually 796, with potential large gaps between successive values of n¹n²(100-n¹-n²) as shown by the above picture.

%d bloggers like this: