Archive for FiveThirtyEight

optimal Gaussian zorbing

Posted in Books, Kids, R, Statistics with tags , , , , , , on August 30, 2022 by xi'an

A zorbing puzzle from the Riddler: cover the plane with four non-intersecting disks of radius one towards getting the highest probability (under the standard bivariate Normal distribution).

As I could not see a simple connection between the disks and the standard Normal, beyond the probability of a disk being given by a non-central chi-square cdf (with two degrees of freedom), I (once again) tried a random search by simulated annealing, which ended up with a configuration like the above, never above 0.777 using a pedestrian R code like

for(t in 1:1e6){# move the disk centres
 Ap=A+vemp*rnorm(2)
 Bp=B+vemp*rnorm(2)
 while(dist(rbind(Ap,Bp))<2)Bp=B+vemp*rnorm(2)
 Cp=C+vemp*rnorm(2)
 while(min(dist(rbind(Ap,Bp,Cp)))<2)Cp=C+vemp*rnorm(2)
 Dp=D+vemp*rnorm(2)
 while(min(dist(rbind(Ap,Bp,Cp,Dp)))<2)Dp=D+vemp*rnorm(2)
 #coverage probability
 pp=pchisq(1,df=2,ncp=Ap%*%Ap)+pchisq(1,df=2,ncp=Bp%*%Bp)+
    pchisq(1,df=2,ncp=Cp%*%Cp)+pchisq(1,df=2,ncp=Dp%*%Dp)
 #simulated annealing step
 if(log(runif(1))<(pp-p)/sqrt(temp)){
   A=Bp;B=Cp;C=Dp;D=Ap;p=pp
   if (sol$val<p) sol=list(val=pp,pos=rbind(A,B,C,D))}
 temp=temp*.9999}

I also tried a simpler configuration where all disk centres were equidistant from a reference centre, but this led to a lower “optimal” probability. I was looking forward the discussion of the puzzle, to discover if anything less brute-force was possible! But there was no deeper argument there beyond the elimination of other “natural” configurations (and missing the non-central χ² connection!). Among these options, having two disks tangent at (0,0) were optimal. But the illustration was much nicer:

a genuine riddle

Posted in Books, Kids, pictures with tags , , , , , on August 5, 2022 by xi'an

A riddle from The Riddler that was pure (if straightforward) logic rather than brute force compulation or mathematical modelling:

Four bags of many marbles are labelled R(ed), B(lue), G(green) and μ (mixed), except that all labels are wrong. Given the possibility to draw two balls, one at a time, from any bag, is it possible to select two monochromatic bags?

Bag μ draw is returning a color, R say, as it is a monochromatic bag. Drawing from another color bag, B say, will produce R or B, in which case it is μ, i.e., mixed (polychromatic), which means the other bags are monochromatic, or G. For this last case, bag B is either polychromatic, in which case bag G is made of blue marbles and bag R of green marbles, or monochromatic, in which case bag G is mixed and bag R is full of blue marbles, but monochromatic for either situation, hence to be chosen on top of bag μ.

riddle of the week

Posted in R with tags , , , , , on April 21, 2022 by xi'an

The Riddler of April 1 offered this simple question:

start with the number 1 and then try to reach a target number through a series of steps. For each step, you can always choose to double the number you currently have. However, if the number happens to be one (1) more than an odd multiple of 3, you can choose to “reduce” — that is, subtract 1 and then divide by 3. What is the smallest positive integer one cannot reach this way?

Which I turned into R steps (while waiting for flight AF19 to Paris)

  while((!(x-1)%%3)&((x-1)%%6)){
    oor[2*x]TRUE
    oor[x<-(x-1)%/%3]=TRUE}

but running an exhaustive search till 10⁸ did not spot any missing integer… Maybe an April fool joke (as the quick riddle was asking for the simplest representation of (x-a)(x-b)…(x-z)…!)

a simpler (?) birthday problem

Posted in Books, Kids, Statistics with tags , , , , , , , on April 9, 2022 by xi'an

A monthly birthday problem from the Riddler:

What was the probability that none of the 40 people had birthdays this month? What is the probability that there is at least one month in the year during which none of the 40 people had birthdays (not necessarily this month)?

Assuming the same number of days in all months, the probability that one individual is not born in March is 1/12 and hence the probability that none of 40 (independent!) persons are not born in March is (11/12)⁴⁰, about 3%. The second question can be solved by reading Feller’s chapter on the combination of events (1970, Chapter IV, p.102). The probability that all months are seeing at least one birthday is

\sum_{i=0}^{12} (-1)^i {12\choose i}(1-i/12)^{40}=0.6732162

which can be checked by a quick R simulation. The complement 0.326 is thus close to 11 x 0.03!

distracting redistricting?

Posted in Books, Statistics with tags , , , , , , , , , on August 26, 2021 by xi'an

“We at FiveThirtyEight will be tracking the whole redistricting process, from proposed maps to final maps, so watch this space for updates!”

FiveThirtyEight is keeping a tracker on the “redistricting” of U.S. states, namely the decennial redrawing of electoral districts. This is still an early stage when no map has been validated by the state legislature and hence I cannot tell whether or not FiveThirtyEight will be analysing gerrymandering in a statistical manner, to figure out how extreme the map is within the collection of all electoral maps. The States being the States, the rules vary widely between them, from the legislators themselves setting the boundaries (while sometimes being very open on their intentions to favour their own side) to independent commissions being in charge. I did not spot any clear involvement of statisticians in the process.

“The application of differential privacy will bring significant harm to Alabama (…) The Census Bureau has not shown that other disclosure avoidance methods
would not satisfy the privacy requirements
” Case No. 3:21-cv-00211

While looking at this highly informative webpage maintained by University of Colorado Law School Doug Spencer, I came across this federal court challenge by the State of Alabama again the Census Bureau for using differential privacy! A statistical version of “shoot the messenger”?! The legal argument of the State is “the Fifth Amendment, alleging that differential privacy is a violation of the one-person, one-vote principle and will result in the dilution of their votes.” I however wonder what is the genuine (political) reason for this challenge!

%d bloggers like this: