coronavirus counts do not count

Posted in Books, pictures, Statistics with tags , , , , , , , , on April 8, 2020 by xi'an

Somewhat by chance I came across Nate Silver‘s tribune on FiveThirtyEight about the meaninglessness of COVID-19 case counts. As it reflects on sampling efforts and available resources rather than actual cases, furthermore sampling efforts from at least a fortnight.

“The data, at best, is highly incomplete, and often the tip of the iceberg for much larger problems. And data on tests and the number of reported cases is highly nonrandom. In many parts of the world today, health authorities are still trying to triage the situation with a limited number of tests available. Their goal in testing is often to allocate scarce medical care to the patients who most need it — rather than to create a comprehensive dataset for epidemiologists and statisticians to study.”

This article runs four different scenarios, with the same actual parameters for the epidemics, and highly different and mostly misleading perceptions based on the testing strategies. This is a highly relevant warning but I am surprised Nate Silver does not move to the rather obvious conclusion that some form of official survey or another, for instance based on capture-recapture and representative samples, testing for present and past infections, should be implemented on a very regular basis, even with a limited number of tested persons to get a much more reliable vision of the status of the epidemics. Here, the French official institute of statistics, INSEE, would be most suited to implement such a scheme.

one or two?

Posted in Books, Kids, R with tags , , , , , , on March 12, 2020 by xi'an

A superposition of two random walks from The Riddler:

Starting from zero, a random walk is produced by choosing moves between ±1 and ±2 at each step. If the choice between both is made towards maximising the probability of ending up positive after 100 steps, what is this probability?

Although the optimal path is not necessarily made of moves that optimise the probability of ending up positive after the remaining steps, I chose to follow a dynamic programming approach by picking between ±1 and ±2 at each step based on that probability:

bs=matrix(0,405,101) #best stategy with value i-203 at time j-1
bs[204:405,101]=1
for (t in 100:1){
tt=2*t
bs[203+(-tt:tt),t]=.5*apply(cbind(
bs[204+(-tt:tt),t+1]+bs[202+(-tt:tt),t+1],
bs[201+(-tt:tt),t+1]+bs[205+(-tt:tt),t+1]),1,max)}


resulting in the probability

> bs[203,1]
[1] 0.6403174


Just checking that a simple strategy of picking ±1 above zero and ±2 below leads to the same value

ga=rep(0,T)
for(v in 1:100) ga=ga+(1+(ga<1))*sample(c(-1,1),T,rep=TRUE)


or sort of

> mean(ga>0)
[1] 0.6403494


With highly similar probabilities when switching at ga<2

> mean(ga>0)
[1] 0.6403183


or ga<0

> mean(ga>0)
[1] 0.6403008


and too little difference to spot a significant improvement between the three boundaries.

multiplying the bars

Posted in Kids, R with tags , , , , , , , on February 25, 2020 by xi'an

The latest Riddler makes the remark that the expression

|-1|-2|-3|

has no unique meaning (and hence value) since it could be

| -1x|-2|-3 | = 5   or   |-1| – 2x|-3| = -5

depending on the position of the multiplication sign and asks for all the possible values of

|-1|-2|…|-9|

which can be explored by a recursive R function for computing |-i|-(i+1)|…|-(i+2j)|

vol<-function(i,j){x=i
if(j){x=c(i-(i+1)*vol(i+2,j-1),abs(i*vol(i+1,j-1)+i+2*j))
if(j>1){for(k in 1:(j-2))
x=c(x,vol(i,k)-(i+2*k+1)*vol(i+2*k+2,j-k-1))}}
return(x)}


producing 40 different values for the ill-defined expression. However, this is incorrect as the product(s) hidden in the expression only involve a single term in vol(i,j)… I had another try with the decomposition of the expression vol(i,j) into a first part and a second part

prod<-function(a,b) a*b[,1]+b[,2]

val<-function(i,j){
x=matrix(c(i,0),ncol=2)
if(j){x=rbind(cbind(i,prod(-(i+1),val(i+2,j-1))),
cbind(abs(prod(-i,val(i+1,j-1))-i-2*j),0))
if(j-1){for(k in 2:(j-1)){
pon=val(i,k-1)
for(m in 1:dim(pon)[1])
x=rbind(x,cbind(pon[m,1],pon[m,2]+prod(-(i+2*k-1),val(i+2*k,j-k))))}}}
return(x)}


but it still fails to produce the right version.

a non-riddle

Posted in Books, Kids, R with tags , , on July 12, 2019 by xi'an

Unless I missed a point in the last riddle from the Riddler, there is very little to say about it:

Given N ocre balls, N aquamarine balls, and two urns, what is the optimal way to allocate the balls to the urns towards drawing an ocre ball with no urn being empty?

Both my reasoning and a two line exploration code led to having one urn with only one ocre ball (and no acquamarine ball) and all the other balls in the second urn.

odz<-function(n,m,t) 2*m/n+(t-2*m)/(t-n)
probz=matrix(0,trunc(N/2)-1,N-1)
for (n in 1:(N-1))
for (m in 1:(trunc(N/2)-1))
probz[m,n]=odz(n,m,N)


riddles on Egyptian fractions and Bernoulli factories

Posted in Books, Kids, R with tags , , , , , , , , , , , , , on June 11, 2019 by xi'an

Two fairy different riddles on the weekend Riddler. The first one is (in fine) about Egyptian fractions: I understand the first one as

Find the Egyptian fraction decomposition of 2 into 11 distinct unit fractions that maximises the smallest fraction.

And which I cannot solve despite perusing this amazing webpage on Egyptian fractions and making some attempts at brute force  random exploration. Using Fibonacci’s greedy algorithm. I managed to find such decompositions

2 = 1 +1/2 +1/6 +1/12 +1/16 +1/20 +1/24 +1/30 +1/42 +1/48 +1/56

after seeing in this short note

2 = 1 +1/3 +1/5 +1/7 +1/9 +1/42 +1/15 +1/18 +1/30 +1/45 +1/90

And then Robin came with the following:

2 = 1 +1/4 +1/5 +1/8 +1/10 +1/12 +1/15 +1/20 +1/21 +1/24 +1/28

which may prove to be the winner! But there is even better:

2 = 1 +1/5 +1/6 +1/8 +1/9 +1/10 +1/12 +1/15 +1/18 +1/20 +1/24

The second riddle is a more straightforward Bernoulli factory problem:

Given a coin with a free-to-choose probability p of head, design an experiment with a fixed number k of draws that returns three outcomes with equal probabilities.

For which I tried a brute-force search of all possible 3-partitions of the 2-to-the-k events for a range of values of p from .01 to .5 and for k equal to 3,4,… But never getting an exact balance between the three groups. Reading later the solution on the Riddler, I saw that there was an exact solution for 4 draws when

$p=\frac{3-\sqrt{3(4\sqrt{9}-6)}}{6}$

Augmenting the precision of my solver (by multiplying all terms by 100), I indeed found a difference of

> solver((3-sqrt(3*(4*sqrt(6)-9)))/6,ba=1e5)[1]
[1] 8.940697e-08

which means an error of 9 x 100⁻⁴ x 10⁻⁸, ie roughly 10⁻¹⁵.