Archive for FiveThirtyEight

one-way random walks

Posted in Kids, R, Statistics with tags , , , on May 2, 2021 by xi'an

A rather puzzling riddle from The Riddler on an 3×3 directed grid and the probability to get from the North-West to the South-East nodes following the arrows. Puzzling because while the solution could be reasonably computed with an R code like

for(i in 1:2^12){
  for(j in 1:12)sol=max(sol,

where paz is the list of the 12 possible paths from North-West to South-East (excluding loops!), leading to a probability of 1135/2¹², I could not find a logical reasoning to reach this number. The paths of length 4, 6, 8 are valid in 2⁸, 2⁶, 2⁴ of the cases, respectively and logically!, but this does not help as they are dependent.

Take thrice thy money, bid me tear the bond

Posted in Books, Kids with tags , , , , , on April 21, 2021 by xi'an

A rather fun riddle for which the pen&paper approach proved easier than coding in R, for once. It goes as follows: starting with one Euro, one sequentially predicts a sequence of four random bits, betting an arbitrary fraction of one’s money at each round. When winning, the bet is doubled, otherwise, it is lost. Under the information that the same bit cannot occur thrice in a row, what is the optimal minimax gain?

Three simplifications: (i) each bet is a fraction ε of the current fortune of the player, which appears as a product of (1±ε) the previous bets (ii) when the outcome is 0 or 1, this fraction ε can thus be chosen in (-1,1), (iii) while the optimal choice is ε=1 when the outcome is known, i.e., when both previous are identical. The final fortune of the player is thus of the form


if the outcome is alternating (e.g., 0101 or 0100), while it is of the form


if there are two identical successive bits in the first three results (e.g., 1101 or 0110). When choosing each of the fractions ε, the minimum final gain must be maximised. This implies that ε=0 for the bet on the final bit  when the outcome is uncertain (and ε=1 otherwise). In case of an alternating début, like 01, the minimal gain is


which is maximised by ε”=1/3, taking the objective value 4(1±ε)(1±ε’)/3. Leading to the gain after the first bit being


which is maximised by ε’=1/5, for the objective value 8(1±ε)/5. By symmetry, the optimal choice is ε=0. Which ends up with a minimax gain of 3/5. [The quote is from Shakespeare, in the Merchant of Venice.]

stack overload

Posted in Books, Kids, R with tags , , , , , on March 3, 2021 by xi'an

The Riddle this week is rather straightforward to explain: stacking identical objects (bars of length and mass two, say) on top of one another so that the center of each new bar is uniformly distributed along the previous bar, what is the distribution of the number of bars when the stack collapses? If I am not confused, the stack collapses the first time the centre of gravity of an upper stack leaves the interval represented by the bar just below. Namely

\left|\frac{1}{N-j} \sum_{i=j+1}^N x_i -x_j\right|>1

when the xi are the bar centres, or equivalently

\max_{2\le j\le N-1} \left|\frac{1}{N-j} \sum_{i=j+1}^N \sum_{k=j+1}^i\epsilon_i \right|>1

where the ε_i‘s are U(-1,1). Which is straightforward to code in R by looking at means of cumulated sums.

new order

Posted in Books, Kids, R, Statistics with tags , , , , on February 5, 2021 by xi'an

The latest riddle from The Riddler was both straightforward: given four iid Normal variates, X¹,X²,X³,X⁴, what is the probability that X¹+X²<X³+X⁴ given that X¹<X³ ? The answer is ¾ and it actually does not depend on the distribution of the variates. The bonus question is however much harder: what is this probability when there are 2N iid Normal variates?

I posted the question on math.stackexchange, then on X validated, but received no hint at a possible simplification of the probability. And then erased the questions. Given the shape of the domain where the bivariate Normal density is integrated, it sounds most likely there is no closed-form expression. (None was proposed by the Riddler.) The probability decreases roughly in N³ when computing this probability by simulation and fitting a regression.

> summary(lm(log(p)~log(r)))

      Min        1Q    Median        3Q       Max 
-0.013283 -0.010362 -0.000606  0.007835  0.039915 

             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.111235   0.008577  -12.97 4.11e-13 ***
log(r)      -0.311361   0.003212  -96.94  < 2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01226 on 27 degrees of freedom
Multiple R-squared:  0.9971,	Adjusted R-squared:  0.997 
F-statistic:  9397 on 1 and 27 DF,  p-value: < 2.2e-16

around the table

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , on December 2, 2020 by xi'an

Monty Python and the Holy Grail: the round table in CamelotThe Riddler has a variant on the classical (discrete) random walk around a circle where every state (but the starting point) has the same probability 1/(n-1) to be visited last. Surprising result that stems almost immediately from the property that, leaving from 0, state a is visited couterclockwise before state b>a is visited clockwise is b/a+b. The variant includes (or seems to include) the starting state 0 as counting for the last visit (as a return to the origin). In that case, all n states, including the origin, but the two neighbours of 0, 1, and n-1, have the same probability to be last. This can also be seen on an R code that approximates (inner loop) the probability that a given state is last visited and record how often this probability is largest (outer loop):

w=0*(1:N)#frequency of most likely last
for(t in 1:1e6){
 o=0*w#probabilities of being last
 for(v in 1:1e6)#sample order of visits

However, upon (jogging) reflection, the double loop is a waste of energy and

for(v in 1:1e8)

should be enough to check that all n positions but both neighbours have the same probability of being last visited. Removing the remaining loop should be feasible by considering all subchains starting at one of the 0’s, since this is a renewal state, but I cannot fathom how to code it succinctly. A more detailed coverage of the original problem (that is, omitting the starting point) was published the Monday after publication of the riddle on R bloggers, following a blog post by David Robinson on Variance Explained.

R codegolf challenge: is there a way to shorten the above R for loop in a single line command?!