Archive for dynamic programming

Take thrice thy money, bid me tear the bond

Posted in Books, Kids with tags , , , , , on April 21, 2021 by xi'an

A rather fun riddle for which the pen&paper approach proved easier than coding in R, for once. It goes as follows: starting with one Euro, one sequentially predicts a sequence of four random bits, betting an arbitrary fraction of one’s money at each round. When winning, the bet is doubled, otherwise, it is lost. Under the information that the same bit cannot occur thrice in a row, what is the optimal minimax gain?

Three simplifications: (i) each bet is a fraction ε of the current fortune of the player, which appears as a product of (1±ε) the previous bets (ii) when the outcome is 0 or 1, this fraction ε can thus be chosen in (-1,1), (iii) while the optimal choice is ε=1 when the outcome is known, i.e., when both previous are identical. The final fortune of the player is thus of the form

(1±ε)(1±ε’)(1±ε”)(1±ε”’)

if the outcome is alternating (e.g., 0101 or 0100), while it is of the form

2(1±ε)(1±ε’)(1±ε”)

if there are two identical successive bits in the first three results (e.g., 1101 or 0110). When choosing each of the fractions ε, the minimum final gain must be maximised. This implies that ε=0 for the bet on the final bit  when the outcome is uncertain (and ε=1 otherwise). In case of an alternating début, like 01, the minimal gain is

min{(1±ε)(1±ε’)(1+ε”),2(1±ε)(1±ε’)(1-ε”)}

which is maximised by ε”=1/3, taking the objective value 4(1±ε)(1±ε’)/3. Leading to the gain after the first bit being

min{4(1±ε)(1+ε’)/3,2(1±ε)(1-ε’)}

which is maximised by ε’=1/5, for the objective value 8(1±ε)/5. By symmetry, the optimal choice is ε=0. Which ends up with a minimax gain of 3/5. [The quote is from Shakespeare, in the Merchant of Venice.]

asymmetric information

Posted in Kids, R with tags , , , , , on November 4, 2020 by xi'an

The Riddler of 16 October had the following puzzle:

Take a real number θ uniformly distributed over (0,100). Among three players, the winner is whoever guessed the closest price without going over θ. In the event all guesses exceeded θ, the contestant with the lowest (and therefore closest) guess is declared the winner. The second player knows the first player’s guess and the third player knows both other guesses. What is the optimal guess for the first player, assuming all players maximise their probability of winning?

Looking at the optimal solution z for the third player leads to six possible choices, depending on the connection between the other guesses, x and y. Which translates in the R code

topz=function(x,y){
  if((2*y>=x)&(y>=1-x))  z=y-.001
  if(max(4*y,1+y)<=2*x)  z=y+.001
  if((2*x<=1+y)&(x<=1-y))z=x+.001
  z}
  
third=function(x,y) ifelse(y<x,topz(x,y),topz(y,x))

For there, the optimal choice y for the second player follows and happens on a boundary of one of the six regions, which itself returns that the optimal choice for the first player is x=2/3, leading to equal chances of winning (although there is some uncertainty on the boundaries). It is thus feasible to beat the asymmetric information. The picture above was my attempt at representing the probabilities of gain for all three players, some of the six regions being clearly visible, with first axis being x and second being y [and z is one of x⁻,x⁺,y⁻,y⁺]. The R code is too pedestrian to be reproduced!

Le Monde puzzle [#1155]

Posted in Books, Kids, R with tags , , , , , , , on September 26, 2020 by xi'an

The weekly puzzle from Le Monde is another Sudoku challenge:

Anahera and Wiremu play a game for T rounds. They successively pick a digit between 1 and 3, never repeating the previous one, and sum these digits. The last to play wins if the sum is a multiple of 3. Who is the winner for an optimal strategy?

By a simple dynamic programming of the optimal strategy at each step

N=3
A=matrix(-1,20,N)
A[1,1:N]=1:N
for (T in 2:20)
for (i in 1:N) A[T,i]=i+ifelse(!T%%2, #parity check
max((i+A[T-1,-i])%%3), #avoid zero
min((i+A[T-1,-i])%%3)) #seek zero

the first to play can always win the game. Not fun!

Le Monde puzzle [#1146]

Posted in Books, Kids, R with tags , , , , , , , , on June 5, 2020 by xi'an

The weekly puzzle from Le Monde is once more disappointing.

Everyday of the month, take 0, 1 or 2 units. If one unit taken past day, next day none can be taken. If two units taken two day ago, none can be taken the current day. What is the strategy maximising the number of units  for February? March? April? Generalise to 0,..,3 units taken each day with 0 compulsory three days after taking 3 units.

as taking 2-1-0 (or 3-2-1-0) sounds like the optimal strategy. Except at the final step when 2, 2-2, 2-2-0, and 1-0-2-2 are best. But then why would one distinguish between the three months..?! Because the outcome differ, as 30, 32, and 33, resp. (or 45, 48 and 51). With an average increase of 1 in the first case and 1.5 in the second.

Another puzzle took too much of my time, namely a code golf challenge to devise a code taking as input a matrix of moves over an n x x grid and returning as input the number of nodes and transient tributaries for each loop (or irreducible set) of the moves. Which I solved by running Markov chains from each starting point long enough to reach stationarity. Entering the moves as

n=3;M=matrix(sample(1:4,n^2,rep=T),n)

and returning the result via 390 bytes

j=cbind;l=sum;a=apply
m=l(!!M);n=m^.5
g=function(A,r=M[A])A+c((r<2)*(1-n*(A[,1]==n))-(r==2)*(1-n*(A[,1]<2)),
(r==3)*(1-n*(A[,2]==n))-(r>3)*(1-n*(A[,2]<2)))
I=c()
for(i in d<-1:n)I=rbind(I,j(i*d/d,d))
for(t in b<-1:m)I=g(I)
p=function(i)i[,1]+n*i[,2]-n-1
K=matrix(0,m,m)
for(t in b)K[b+m*p(I<-g(I))]=1
s=o=a(u<-unique(K),1,l)
for(k in 1:l(!!s))s[k]=l(!a(!!sweep(K,2,u[k,],'-'),1,l))
j(o,s-o)

which could be much shorter if only matrices could be indexed from 0 as in C.

one or two?

Posted in Books, Kids, R with tags , , , , , , on March 12, 2020 by xi'an

A superposition of two random walks from The Riddler:

Starting from zero, a random walk is produced by choosing moves between ±1 and ±2 at each step. If the choice between both is made towards maximising the probability of ending up positive after 100 steps, what is this probability?

Although the optimal path is not necessarily made of moves that optimise the probability of ending up positive after the remaining steps, I chose to follow a dynamic programming approach by picking between ±1 and ±2 at each step based on that probability:

bs=matrix(0,405,101) #best stategy with value i-203 at time j-1
bs[204:405,101]=1
for (t in 100:1){
  tt=2*t
  bs[203+(-tt:tt),t]=.5*apply(cbind(
     bs[204+(-tt:tt),t+1]+bs[202+(-tt:tt),t+1],
     bs[201+(-tt:tt),t+1]+bs[205+(-tt:tt),t+1]),1,max)}

resulting in the probability

> bs[203,1]
[1] 0.6403174

Just checking that a simple strategy of picking ±1 above zero and ±2 below leads to the same value

ga=rep(0,T)
for(v in 1:100) ga=ga+(1+(ga<1))*sample(c(-1,1),T,rep=TRUE)

or sort of

> mean(ga>0)
[1] 0.6403494

With highly similar probabilities when switching at ga<2

> mean(ga>0)
[1] 0.6403183

or ga<0

> mean(ga>0)
[1] 0.6403008

and too little difference to spot a significant improvement between the three boundaries.

%d bloggers like this: