Archive for FiveThirtyEight

continental divide

Posted in Books, Kids, pictures, R with tags , , , , on May 19, 2017 by xi'an

While the Riddler puzzle this week was anticlimactic,  as it meant filling all digits in the above division towards a null remainder, it came as an interesting illustration of how different division is taught in the US versus France: when I saw the picture above, I had to go and check an American primary school on-line introduction to division, since the way I was taught in France is something like that

with the solution being that 12128316 = 124 x 97809… Solved by a dumb R exploration of all constraints:

for (y in 111:143)
for (z4 in 8:9)
for (oz in 0:999){
  if ((digz[2]==0)&(x>=1e7)&(x<1e8)){ 
   if ((digz[5]*y>=1e3)&(digz[4]*y<1e4) &(r1>9)&(r1<100)){ 
    if ((7*y>=1e2)&(7*y<1e3)&(r2>=1e2)&(r2<1e3)){     
     if ((digz[3]*y>=1e2)&(digz[3]*y<1e3)&(r3>9)&(r3<1e2)){
       if (r4<y) solz=rbind(solz,c(y,z,x))

Looking for a computer-free resolution, the constraints on z exhibited by the picture are that (a) the second digit is 0 and the fourth digit is 7.  Moreover, the first and fifth digits are larger than 7 since y times these digits is a four-digit number. Better, since the second subtraction from a three-digit number by 7y returns a three-digit number and the third subtraction from a four-digit number by ny returns a two-digit number, n is larger than 7 but less than the first and fifth digits. Ergo, z is necessarily 97809! Furthermore, 8y<10³ and 9y≥10³, which means 111<y<125. Plus the constraint that 1000-8y≤99 implies y≥112. Nothing gained there! This leaves 12 values of y to study, unless there is another restriction I missed…

“In short, the French presidential election is a mess”

Posted in Statistics with tags , , , , , , on April 23, 2017 by xi'an

Harry Enten (and not Nate Silver as reported by Le Monde) published yesterday a post on Five-Thirty-Eight about the unpredictability of the French elections. Which essentially states the obvious, namely that the four major candidates all stand a chance to make it to the runoff. (The post classifies Macron as a former left-wing socialist, which shows a glaring misunderstanding of the candidate or a massive divergence of what left-wing means between France and the USA.) The tribune states both that the polls could exhibit a bigger mistake than in the previous elections and that Le Pen score is unlikely to be underestimated, because voters are no longer shy to acknowledge they vote for a fascist candidate. One argument for the error in the polls is attributed to pollsters “herding” their results, i.e., shrinking the raw figures towards the global average taken over previous polls. A [rather reasonable] correction dismissed by Le Monde and French pollsters. While Enten argues that the variability of the percentages over fifty polls is too small to be plausible, assuming a Normal distribution that may not hold because French pollsters use quotas to build their polling population. In any case, this analysis, while cautious and reasonably so!, does not elaborate on the largest question mark, the elephant in the room, namely the percentage of abstentions today and their distribution among the political spectrum, which may eventually make the difference tonight. Indeed, “the bottom line is that we don’t know what’s going to happen on Sunday.” And it is definitely frightening!

optimultiplication [a riddle]

Posted in Books, Kids, R, Statistics with tags , , , , , on April 14, 2017 by xi'an

The riddle of this week is about an optimisation of positioning the four digits of a multiplication of two numbers with two digits each and is open to a coding resolution:

Four digits are drawn without replacement from {0,1,…,9}, one at a time. What is the optimal strategy to position those four digits, two digits per row, as they are drawn, toward minimising the average product?

Although the problem can be solved algebraically by computing E[X⁴|x¹,..] and E[X⁴X³|x¹,..]  I wrote three R codes to “optimise” the location of the first three digits: the first digit ends up as a unit if it is 5 or more and a multiple of ten otherwise, on the first row. For the second draw, it is slightly more variable: with this R code,

for (t in 1:N)
if (i<5){ #10*i

the resulting digit again ends up as a unit if it is 5 (except when x¹=7,8,9, where it is 4) or more and a multiple of ten otherwise, but on the second row. Except when x¹=0, x²=1,2,3,4, when they end up on the first row together, 0 obviously in front.

For the third and last open draw, there is only one remaining random draw, which mean that the decision only depends on x¹,x²,x³ and E[X⁴|x¹,x²,x³]=(45-x¹-x²-x³)/7. Attaching x³ to x² or x¹ will then vary monotonically in x³, depending on whether x¹>x² or x¹<x²:

if ((i<1)&(j<5)){ pos=c(10*comean+k,comean+10*k)}
if ((i<5)&(j>4)){ pos=c(100*i*comean+k*j,j*comean+100*i*k)}
if ((i>0)&(i<5)&(j<5)){ pos=c(i*comean+k*j,j*comean+i*k)} 
if ((i<7)&(i>4)&(j<5)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)} 
if ((i<7)&(i>4)&(j>4)){ pos=c(i*comean+k*j,j*comean+i*k)}
if ((i>6)&(j<4)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)} 
if ((i>6)&(j>3)){ pos=c(i*comean+k*j,j*comean+i*k)}

Running this R code for all combinations of x¹,x² shows that, except for the cases x¹≥5 and x²=0, for which x³ invariably remains in front of x¹, there are always values of x³ associated with each position.

an express riddle

Posted in Books, Kids, R with tags , , on January 20, 2017 by xi'an

A quick puzzle on The Riddler this week that enjoys a quick solution once one writes it out. The core of the puzzle is about finding the average number of draws one need to empty a population of size T if each draw is uniform over the remaining number of individuals between one and the number that remain. It is indeed easy to see that this average satisfies

\epsilon^T=1+\frac{1}{T}\sum_{i=1}^{T-1} \epsilon^i

since all draws but one require an extra draw. A recursion then leads by elimination to deduce that


which is the beginning of the (divergent) harmonic series. In the case T=30, the solution is (almost) equal to 4.

> sum(1/(1:30))*1e10
[1] 39949871309

A second riddle the same week reminded me of a result in Devroye’s Non-Uniform Random Variate Generation, namely to find the average number of draws from a Uniform until the sequence goes down. Actually, the real riddle operates with a finite support Uniform, but I find the solution with the continuous Uniform more elegant. And it only took a few metro stops to solve. The solution goes as follows: the probability to stop after two Uniform draws is 1/2, after n uniform draws, it is (n-1)/n!, which does sum up to 1:

\sum_{n=2}^\infty \frac{n-1}{n!} = \sum_{n=2}^\infty \frac{n}{n!} - \sum_{n=2}^\infty \frac{1}{n!} = \sum_{n=1}^\infty \frac{1}{n!} - \sum_{n=2}^\infty \frac{1}{n!}=1

and the expectation of this distribution is e-1 by a very similar argument, as can be checked by a rudimentary Monte Carlo experiment

> over(1e7) #my implementation of the puzzle
[1] 1.7185152

another riddle

Posted in Books, Kids with tags , , , on June 22, 2016 by xi'an

A riddle on The Riddler that pertains to game theory [and does not truly require coding]:

Ten pirates have ten gold pieces to divide and each have a unique rank, from the captain on down. The captain puts forth the first plan to divide up the gold, whereupon everyone votes. If at least half the pirates vote for the plan, it is enacted, and the gold is distributed accordingly. If the plan gets fewer than half the votes, however, the captain is killed, the second-in-command is promoted, and the process starts over.

Pirates always vote by the following rules, with the earliest rule taking precedence in a conflict:

  1. Self-preservation: A pirate values his life above all else.
  2. Greed: A pirate seeks as much gold as possible.
  3. Bloodthirst: Failing a threat to his life or bounty, a pirate always votes to kill.

Under this system, how do the pirate divide up their gold?

An obviously final configuration is when there are only two pirates left since whatever division the first pirate proposes, including the one where he gets all the coins, the second (and last) one cannot oppose. Hence the last pirate must avoid reaching a two person configuration. On the opposite the one-to-last pirate should aim at this solution, but he is the only one! This means that for any configuration but one the last and second to last pirates will outvote the one to last. Meaning that the optimum for the second to one pirate is a partition of (9,0,1). Moving to the general case, when there remains 2p crew members, the pirate in command must secure a positive vote from  (p-1) other pirates below him, which means those (p-1) pirates must get one coin more than what they would get on the next round. This amounts to picking the (p-1) pirates with no reward on the next round and giving them one coin each, leading to a reward of 11-p for the current leader. When there remains 2p-1 crew members, the optimal solution is identical, namely to award one coin each for the (p-1) pirates with no reward on the next round. With a total of 10 pirates, the optimal configuration for the captain is to separate the 10 coins into (6,0,1,0,1,0,1,0,1,0).

another riddle with a stopping rule

Posted in Books, Kids, R with tags , , , on May 27, 2016 by xi'an

A puzzle on The Riddler last week that is rather similar to an earlier one. Given the probability (1/2,1/3,1/6) on {1,2,3}, what is the mean of the number N of draws to see all possible outcomes and what is the average number of 1’s in those draws? The second question is straightforward, as the proportions of 1’s, 2’s and 3’s in the sequence till all values are observed remain 3/6, 2/6 and 1/6. The first question follows from the representation of the average

\mathbb{E}[N]=\sum_{n=3}^\infty \mathbb{P}(N>n) + 3

as the probability to exceed n is the probability that at least one value is not observed by the n-th draw, namely


which leads to an easy summation for the expectation, namely


Continue reading

an integer programming riddle

Posted in Books, Kids, R with tags , , , , on April 21, 2016 by xi'an

A puzzle on The Riddler this week that ends up as a standard integer programming problem. Removing the little story around the question, it boils down to optimise


under the constraints

400a+400b+150c+50d≤1000, b≤a, a≤1, c≤8, d≤4,

and (a,b,c,d) all non-negative integers. My first attempt was a brute force R code since there are only 3 x 9 x 5 = 135 cases:


for (a in 0:1)
for (b in 0:a)
for (k in 0:8)
for (d in 0:4){
  if (max(cost)<=0){ gain=f.obj%*%c(a,b,k,d)
if (gain>sol){

which returns the value:

> sol
[1,]  425
> argu
[1] 1 0 3 3

This is confirmed by a call to an integer programming code like lpSolve:

> lp("max",f.obj,f.con,f.dir,f.rhs,
Success: the objective function is 425
> lp("max",f.obj,f.con,f.dir,f.rhs,$sol
[1] 1 0 3 3

which provides the same solution.