## easy riddle

Posted in Books, Kids, R with tags , , , , , on July 12, 2017 by xi'an

From the current Riddler, a problem that only requires a few lines of code and a few seconds of reasoning. Or not.

N households each stole the earnings from one of the (N-1) other households, one at a time. What is the probability that a given household is not burglarised? And what are the expected final earnings of each household in the list, assuming they all start with \$1?

The first question is close to Feller’s enveloppe problem in that

$\left(1-\frac{1}{N-1}\right)^{N-1}$

is close to exp(-1) for N large. The second question can easily be solved by an R code like

N=1e3;M=1e6
fina=rep(1,N)
for (v in 1:M){
ordre=sample(1:N)
vole=sample(1:N,N,rep=TRUE)
while (min(abs(vole-(1:N)))==0)
vole[abs(vole-(1:N))==0]=sample(1:N,
sum(vole-(1:N)==0))
cash=rep(1,N)
for (t in 1:N){
cash[ordre[t]]=cash[ordre[t]]+cash[vole[t]];cash[vole[t]]=0}
fina=fina+cash[ordre]}


which returns a pretty regular exponential-like curve, although I cannot figure the exact curve beyond the third burglary. The published solution gives the curve

${\frac{N-2}{N-1}}^{999}\times 2+{\frac{1}{N-1}}^{t-1}\times{\frac{N-1}{N}}^{N-t}\times\frac{N}{N-1}$

corresponding to the probability of never being robbed (and getting on average an extra unit from the robbery) and of being robbed only before robbing someone else (with average wealth N/(N-1)).

## [un]solved riddles

Posted in Books, Kids, R with tags , , on July 4, 2017 by xi'an

On the Riddler of last week, first a birthday puzzle:

Given a group of 23 persons, what is the probability of observing three pairs of identical birthdays?

which can be found by a quick simulation as

ave=0
for (t in 1:1e6){
dupz=dates[duplicated(sample(1:365,23,rep=TRUE))]
ave=ave+as.integer((length(dupz)==3)&
(length(unique(dupz))==3))}}
ave/M


returning a value of 0.0183, but which combinatoric resolution I could not fully fathom without a little help from a friend (-ly blog). I had the multinomial coefficient

${23\choose 2\ 2\ 2}$

for the allocation of the 23 persons to one of the three pairs or none, as well as the probability

$\dfrac{365\times\cdots\times(365-(23-4)+1)}{365^{23}}$

but I had forgotten the 3! in the denominator for the permutations of the three pairs, which leads again to

${23 \choose 2\ 2\ 2}\dfrac{365\times\cdots\times(365-(23-4)+1)}{3!\times365^{23}} = 0.0183$

A question that also led to an unsolved question: in the even this probability was much smaller, what is an easy way into constructing a more efficient importance sampler?

The second riddle was just as easy to code in R:

A game of tag goes by the following rules: (i) anyone untagged can tag anyone untagged; (ii) anyone tagged by a player tagged gets untagged; (iii) the winner is the last untagged player. What is the expected number of runs for N players?

The outcome of

game=function(N=12){
a=rep(0,N);T=0
while (sum(a==0)>1){
ij=sample((1:N)[a==0],2)
a[ij[2]]=ij[1];a[a==ij[2]]=0
T=T+1}
return(T)}


leads to an average value of

$2^{N-1}-1$

but I had no clear quick explanation for the doubling phenomenon. Until I picked a pen and a sheet of paper and drew the last steps of the game: to decrease down to 1, the size of the untagged survivors has to get through …,3,2 and each time the eliminated player needs to have tagged no other player since otherwise the population grows again. This has to apply all the way to the second round, where N-1 players remain and the one tagged needs to be anyone but the one who tagged the first one. And so on…

## non-Bayesian decision riddle

Posted in Statistics with tags , , , on June 22, 2017 by xi'an

As a continuation of the Bayesian resolution of last week riddle, I looked at a numeric resolution of the four secretaries problem, while in the train back from Rouen (and trying to block the chatter of my neighbours, a nuisance I find myself more and more sensitive to!). The target function is defined as

gainz=function(b,c,T=1e4,type="raw"){
x=matrix(runif(4*T),ncol=4)
maz=t(apply(x,1,cummax))
zam=t(apply(x[,4:1],1,cummax))
if (type=="raw"){return(mean(
((x[,2]>b*x[,1])*x[,2]+
(x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*x[,3]+
(x[,3]<c*maz[,2])*x[,4]))/maz[,4]))}
if (type=="global"){return(mean(
((x[,2]>b*x[,1])*(x[,2]==maz[,4])+
(x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*(x[,3]==maz[,4])+
(x[,3]<c*maz[,2])*(x[,4]==maz[,4])))))}
if (type=="remain"){return(mean(
((x[,2]>b*x[,1])*(x[,2]==zam[,3])+
(x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*(x[,3]==zam[,2])+
(x[,3]<c*maz[,2])*(x[,4]==zam[,2])))))}}


where the data is generated from a U(0,1) distribution as the loss functions are made scaled free by deciding to always sacrifice the first draw, x¹. This function is to be optimised in (b,c) and hence I used a plain vanilla simulated annealing R code:

avemale=function(T=3e4,type){
b=c=.5
maxtar=targe=gainz(b,c,T=1e4,type)
temp=0.1
for (t in 1:T){
bp=b+runif(1,-temp,temp)
cp=c+runif(1,-temp,temp)
parge=(bp>0)*(cp>0)*gainz(bp,cp,T=1e4,type)
if (parge>maxtar){
b=bs=bp;c=cs=cp;maxtar=targe=parge}else{
if (runif(1)<exp((parge-targe)/temp)){
b=bp;c=cp;targe=parge}}
temp=.9999*temp}
return(list(bs=bs,cs=cs,max=maxtar))}


with outcomes

• b=1, c=.5, and optimum 0.8 for the raw type
• b=c=1 and optimum 0.45 for the global type
• b undefined, c=2/3 and optimum 0.75 for the remain type

## Bayesian decision riddle

Posted in Kids, Statistics, Books with tags , , , , on June 15, 2017 by xi'an

The current puzzle on The Riddler is a version of the secretary problem with an interesting (?) Bayesian solution.

Given four positive numbers x¹, x², x³, x⁴, observed sequentially, the associated utility is the value of x at the stopping time. What is the optimal stopping rule?

While nothing is mentioned about the distribution of the x’s, I made the assumption that they were iid and uniformly distributed over (0,M), with M unknown and tried a Bayesian resolution with the non-informative prior π(M)=1/M. And failed. The reason for this failure is that the expected utility is infinite at the first step: while the posterior expected utility is finite with three and two observations, meaning I can compare stopping and continuing at the second and third steps, the predicted expected reward for continuing after observing x¹ does not exist because the expected value of max(x¹,x²) given x¹ does not exist. As the predictive density of x² is max(x¹,x²)⁻²…  Several alternatives are possible to bypass this impossible resolution, from changing the utility function to picking another reference prior.

For instance, using a prior like π(M)=1/M² l(and the same monetary return utility) leads to a proper optimal solution, namely

1. always wait for the second observation x²
2. stop at x² if x²>11x¹/12, else wait for x³
3. stop at x³ if x³>23 max(x¹,x²)/24, else observe x⁴

obtained analytically on a bar table in Rouen (and checked numerically later).

Another approach is to try to optimise the probability to pick the largest amount of the four x’s, but this is not leading to an interesting solution, since it corresponds to picking the first maximum after x¹, while picking the largest among remaining ones leads to a somewhat convoluted solution I have no patience to produce here! Plus this is not a really pertinent loss function as it does not discriminate enough against waiting…

## continental divide

Posted in Books, Kids, pictures, R with tags , , , , on May 19, 2017 by xi'an

While the Riddler puzzle this week was anticlimactic,  as it meant filling all digits in the above division towards a null remainder, it came as an interesting illustration of how different division is taught in the US versus France: when I saw the picture above, I had to go and check an American primary school on-line introduction to division, since the way I was taught in France is something like that

with the solution being that 12128316 = 124 x 97809… Solved by a dumb R exploration of all constraints:

for (y in 111:143)
for (z4 in 8:9)
for (oz in 0:999){
z=oz+7e3+z4*1e4
x=y*z
digx=digits(x)
digz=digits(z)
if ((digz[2]==0)&(x>=1e7)&(x<1e8)){
r1=trunc(x/1e4)-digz[5]*y
if ((digz[5]*y>=1e3)&(digz[4]*y<1e4) &(r1>9)&(r1<100)){
r2=10*r1+digx[4]-7*y
if ((7*y>=1e2)&(7*y<1e3)&(r2>=1e2)&(r2<1e3)){
r3=10*r2+digx[3]-digz[3]*y
if ((digz[3]*y>=1e2)&(digz[3]*y<1e3)&(r3>9)&(r3<1e2)){
r4=10*r3+digx[2]
if (r4<y) solz=rbind(solz,c(y,z,x))
}}}}


Looking for a computer-free resolution, the constraints on z exhibited by the picture are that (a) the second digit is 0 and the fourth digit is 7.  Moreover, the first and fifth digits are larger than 7 since y times these digits is a four-digit number. Better, since the second subtraction from a three-digit number by 7y returns a three-digit number and the third subtraction from a four-digit number by ny returns a two-digit number, n is larger than 7 but less than the first and fifth digits. Ergo, z is necessarily 97809! Furthermore, 8y<10³ and 9y≥10³, which means 111<y<125. Plus the constraint that 1000-8y≤99 implies y≥112. Nothing gained there! This leaves 12 values of y to study, unless there is another restriction I missed…

## a secretary problem with maximum ability

Posted in Kids, R with tags , , , , on April 28, 2017 by xi'an

The Riddler of today has a secretary problem, where one measures sequentially N random variables until one deems the current variable to be the largest of the whole sample. The classical secretary problem has a counter-intuitive solution where one first measures N/e random variables without taking any decision and then and only then picks the first next outcome larger than the largest in the first group. (For large values of N.) The added information in the current riddle is that the distribution of those iid random variables is set to be uniform on {1,…,M}, which begs for a modification in the algorithm. As for instance when observing M on the current draw.

The approach I devised is certainly suboptimal, as I decided to pick the currently observed value if the (conditional) probability it is the largest is larger than the probability subsequent draws. This translates into the following R code:

M=100 #maximum value
N=10  #total number of draws
hyprob=function(m){
# m is sequence of draws so far
n=length(m);mmax=max(m)
if ((m[n]<mmax)||(mmax-n<N-n)){prob=0
}else{
prob=prod(sort((1:mmax)[-m],dec=TRUE)
[1:(N-n)]/((M-n):(M-N+1))}
return(prob)}

decision=function(draz=sample(1:M,N)){
i=0
keepgoin=TRUE
while ((keepgoin)&(i<N)){
i=i+1
keepgoin=(hyprob(draz[1:i])<0.5)}
return(c(i,draz[i],(draz[i]<max(draz))))}


which produces a winning rate of around 62% when N=10 and M=100, hence much better than the expected performances of the (asymptotic) secretary algorithm, with a winning frequency of 1/e. (For N=10 and M=100, the winning frequency is only 27%.)

## optimultiplication [a riddle]

Posted in Books, Kids, R, Statistics with tags , , , , , on April 14, 2017 by xi'an

The riddle of this week is about an optimisation of positioning the four digits of a multiplication of two numbers with two digits each and is open to a coding resolution:

Four digits are drawn without replacement from {0,1,…,9}, one at a time. What is the optimal strategy to position those four digits, two digits per row, as they are drawn, toward minimising the average product?

Although the problem can be solved algebraically by computing E[X⁴|x¹,..] and E[X⁴X³|x¹,..]  I wrote three R codes to “optimise” the location of the first three digits: the first digit ends up as a unit if it is 5 or more and a multiple of ten otherwise, on the first row. For the second draw, it is slightly more variable: with this R code,

second<-function(i,j,N=1e5){draw
drew=matrix(0,N,2)
for (t in 1:N)
drew[t,]=sample((0:9)[-c(i+1,j+1)],2)
conmean=(45-i-j)/8
conprod=mean(drew[,1]*drew[,2])
if (i<5){ #10*i
pos=c((110*i+11*j)*conmean,
100*i*j+10*(i+j)*conmean+conprod,
(100*i+j)*conmean+10*i*j+10*conprod)}else{
pos=c((110*j+11*i)*conmean,
10*i*j+(100*j+i)*conmean+10*conprod,
10*(i+j)*conmean+i*j+100*conprod)
return(order(pos)[1])}


the resulting digit again ends up as a unit if it is 5 (except when x¹=7,8,9, where it is 4) or more and a multiple of ten otherwise, but on the second row. Except when x¹=0, x²=1,2,3,4, when they end up on the first row together, 0 obviously in front.

For the third and last open draw, there is only one remaining random draw, which mean that the decision only depends on x¹,x²,x³ and E[X⁴|x¹,x²,x³]=(45-x¹-x²-x³)/7. Attaching x³ to x² or x¹ will then vary monotonically in x³, depending on whether x¹>x² or x¹<x²:

fourth=function(i,j,k){
comean=(45-i-j-k)/7
if ((i<1)&(j<5)){ pos=c(10*comean+k,comean+10*k)}
if ((i<5)&(j>4)){ pos=c(100*i*comean+k*j,j*comean+100*i*k)}
if ((i>0)&(i<5)&(j<5)){ pos=c(i*comean+k*j,j*comean+i*k)}
if ((i<7)&(i>4)&(j<5)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)}
if ((i<7)&(i>4)&(j>4)){ pos=c(i*comean+k*j,j*comean+i*k)}
if ((i>6)&(j<4)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)}
if ((i>6)&(j>3)){ pos=c(i*comean+k*j,j*comean+i*k)}
return(order(pos)[1])}

Running this R code for all combinations of x¹,x² shows that, except for the cases x¹≥5 and x²=0, for which x³ invariably remains in front of x¹, there are always values of x³ associated with each position.