Archive for dynamic programming

Le Monde puzzle [#1070]

Posted in Books, Kids, R, University life with tags , , , , , , , on October 11, 2018 by xi'an

Rewording Le Monde mathematical puzzle  fifth competition problem

For the 3×3 tables below, what are the minimal number of steps to move from left to rights when the yellow tokens can only be move to an empty location surrounded by two other tokens?

In the 4×4 table below, there are 6 green tokens. How many steps from left to right?

Painful and moderately mathematical, once more… For the first question, a brute force simulation of random valid moves of length less than 100 returns solutions in 4 steps:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
     1    1    1    0    0    1    0    0    1
     1    0    1    0    1    1    0    0    1
     0    0    1    1    1    1    0    0    1
     0    0    1    1    0    1    0    1    1
     0    0    1    0    0    1    1    1    1

But this is not an acceptable move because of the “other” constraint. Imposing this constraint leads to a solution in 9 steps, but is this the lowest bound?! It actually took me most of the weekend (apart from a long drive to and from a short half-marathon!) to figure out a better strategy than brute force random exploration: the trick I eventually figured out is to start from the finishing (rightmost) value F of the grid and look at values with solutions available in 1,2,… steps. This is not exactly dynamic programming, but it keeps the running time under control if there is a solution associated with the starting (leftmost) value S. (Robin proceeded reverse-wise, which in retrospect is presumably equivalent, if faster!) The 3×3 grid has 9 choose 5, ie 126, possible configurations with 5 coins, which means the number of cases remains under control. And even so for the 4×4 grid with 6 coins, with 8008 configurations. This led to a 9 step solution for n=3 and the proposed starting grid in yellow:

[1] 1 1 1 0 0 1 0 0 1
[1] 1 1 0 0 1 1 0 0 1
[1] 1 1 0 1 1 0 0 0 1
[1] 0 1 0 1 1 1 0 0 1
[1] 0 1 1 1 0 1 0 0 1
[1] 1 1 1 1 0 0 0 0 1
[1] 0 1 1 1 1 0 0 0 1
[1] 0 0 1 1 1 0 0 1 1
[1] 0 0 1 1 0 0 1 1 1
[1] 0 0 1 0 0 1 1 1 1

and a 19 step solution for n=4:

[1] 1 0 0 0 0 1 0 0 0 1 1 1 0 0 0 1
[1] 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 1
[1] 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0
[1] 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0
[1] 1 1 0 0 0 0 1 1 0 0 0 1 0 0 1 0
[1] 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0
[1] 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0
[1] 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 0
[1] 1 1 0 1 0 1 1 0 0 0 0 0 0 0 1 0
[1] 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0
[1] 0 0 0 1 1 1 1 0 0 0 1 0 0 0 1 0
[1] 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0
[1] 0 0 0 1 1 1 0 0 1 1 0 0 0 0 1 0
[1] 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 0
[1] 0 0 0 1 0 1 0 0 1 0 0 0 1 1 1 0
[1] 0 0 0 1 1 1 0 0 1 0 0 0 1 0 1 0
[1] 0 0 0 1 0 1 0 0 1 1 0 0 1 0 1 0
[1] 0 0 0 1 0 1 0 0 0 1 1 0 1 0 1 0
[1] 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 0

The first resolution takes less than a minute and the second one just a few minutes (or less than my short morning run!). Surprisingly, using this approach does not require more work, which makes me wonder at the solution Le Monde journalists will propose. Given the (misguided) effort put into my resolution, seeing a larger number of points for this puzzle is no wonder.

riddles on a line

Posted in Books, Kids, R with tags , , , , , , , , , on August 22, 2018 by xi'an

In the Riddler of August 18, two riddles connected with the integer set Ð={2,3,…,10}:

  1. Given a permutation of Ð, what is the probability that the most likely variation (up or down) occurs at each term?
  2. Given three players choosing three different integers in Ð sequentially, and rewards in Ð allocated to the closest of the three players (with splits in case of equal distance), what is the reward for each given an optimal strategy?

For the first question, a simple code returns 0.17…

winofail<-function(permz){ 
  if (length(permz)>1){
    lenoperm=length(permz[-1])/2
    win=(permz[1]<permz[2])*(sum(permz[-1]>permz[1])>lenoperm)+
     (permz[1]>permz[2])*(sum(permz[-1]<permz[1])>lenoperm)+
      (runif(1)<.5)*(sum(permz[-1]>permz[1])==lenoperm)
    win=win&&winofail(permz[-1])
  }else win=TRUE
  return(win)}

(but the analytic solution exhibits a cool Pascal triangular relation!) and for the second question, a quick recursion or dynamic programming produces 20, 19, 15 as the rewards (and 5, 9, 8 as the locations)

gainz<-function(seqz){
  difz=t(abs(outer(2:10,seqz,"-")))
  cloz=apply(difz,2,rank)
  return(apply(rbind(2:10,2:10,2:10)*
   ((cloz==1)+.5*(cloz==1.5)),1,sum))}

timeline<-function(prev){ 
  if (length(prev)==0){ 
   sol=timeline(10);bez=gainz(sol)[1] 
   for (i in 2:9){ 
    bol=timeline(i);comp=gainz(bol)[1] 
    if (comp>bez){
        bez=comp;sol=bol}}}
  if (length(prev)==1){
    bez=-10
    for (i in (2:10)[-(prev-1)]){
      bol=timeline(c(prev,i))
      comp=gainz(bol)[2]
      if (comp>bez){
        bez=comp;sol=bol}}}
  if (length(prev)==2){
    bez=-10
    for (i in (2:10)[-(prev-1)]){
      bol=c(prev,i)
      comp=gainz(bol)[3]
      if (comp>bez){
        bez=comp;sol=bol}}}
  return(sol)}

After reading the solution on the Riddler, I realised I had misunderstood the line as starting at 2 when it was actually starting from 1. Correcting for this leads to the same 5, 9, 8 locations of picks, with rewards of 21, 19, 15.

Le Monde puzzle [#1053]

Posted in Books, Kids, R with tags , , , , , , , on June 21, 2018 by xi'an

An easy arithmetic Le Monde mathematical puzzle again:

  1. If coins come in units of 1, x, and y, what is the optimal value of (x,y) that minimises the number of coins representing an arbitrary price between 1 and 149?
  2.  If the number of units is now four, what is the optimal choice?

The first question is fairly easy to code

coinz <- function(x,y){
  z=(1:149)
  if (y<x){xx=x;x=y;y=xx}
  ny=z%/%y
  nx=(z%%y)%/%x
  no=z-ny*y-nx*x
  return(max(no+nx+ny))
}

and returns M=12 as the maximal number of coins, corresponding to x=4 and y=22. And a price tag of 129.  For the second question, one unit is necessarily 1 (!) and there is just an extra loop to the above, which returns M=8, with other units taking several possible values:

[1] 40 11  3
[1] 41 11  3
[1] 55 15  4
[1] 56 15  4

A quick search revealed that this problem (or a variant) is solved in many places, from stackexchange (for an average—why average?, as it does not make sense when looking at real prices—number of coins, rather than maximal), to a paper by Shalit calling for the 18¢ coin, to Freakonomics, to Wikipedia, although this is about finding the minimum number of coins summing up to a given value, using fixed currency denominations (a knapsack problem). This Wikipedia page made me realise that my solution is not necessarily optimal, as I use the remainders from the larger denominations in my code, while there may be more efficient divisions. For instance, running the following dynamic programming code

coz=function(x,y){
  minco=1:149
  if (x<y){ xx=x;x=y;y=xx}
  for (i in 2:149){
    if (i%%x==0)
      minco[i]=i%/%x
    if (i%%y==0)
      minco[i]=min(minco[i],i%/%y)
    for (j in 1:max(1,trunc((i+1)/2)))
          minco[i]=min(minco[i],minco[j]+minco[i-j])
      }
  return(max(minco))}

returns the lower value of M=11 (with x=7,y=23) in the first case and M=7 in the second one.

Le Monde puzzle [#1044]

Posted in Books, Kids with tags , , , , , , on March 12, 2018 by xi'an

A dynamic programming Le Monde mathematical puzzle:

Bob and Alice are playing a game where Alice fills a one-litre bottle from a water fountain, and empties it between four buckets. Bob then empties two of the four buckets. Alice then fills again her bottle and empties it again in the buckets. Alice wins if she manages to fill one bucket after a finite number of steps. What is the maximum capacity of a bucket for Alice to find a winning strategy?

The question sounded too complex to solve by an R code so I somewhat simplified it by deciding that Alice could not allocate any portion of the litre to a bucket but instead only 0,¼,⅓,½,⅔,¾,1. And then looked at a finite horizon to see how much she could fill a bucket when Bob was trying to minimise this amount: a crude R code just took too long for an horizon of 6 steps and hence I tried to reduce the number of calls to my recursive function

solfrak=c(0,.25,.333,.5,.667,.75,1)
petifil=function(buck=rep(0,4),hor=3){
#eliminate duplicates
 albukz=NULL
 for (a in solfrak)
 for (b in solfrak[!(solfrak+a>1)])
 for (c in solfrak[!(solfrak+a+b>1)]){
   if (a+b+c<=1){ albukz=rbind(albukz,
      c(a,b,c,1-a-b-c))}} 
   albukz=t(apply(buck+albukz,1,sort)) 
   albukz=albukz[!duplicated(albukz),] 
   if (is.matrix(albukz)){ 
    bezt=max(apply(albukz,1,petimpty,hor=hor-1)) 
    }else{ 
      bezt=petimpty(albukz,hor-1)} 
    return(bezt)} 

petimpty=function(buck,hor){ 
  if (hor>1){
    albukz=NULL
     for (i in 1:3)
     for (j in (i+1):4)
      albukz=rbind(albukz,c(buck[-c(i,j)],0,0))
     albukz=t(apply(albukz,1,sort))
     albukz=albukz[!duplicated(albukz),]
     if (is.matrix(albukz)){
       bezt=min(apply(albukz,1,petifil,hor))}else{
        bezt=petifil(albukz,hor)}
     }else{
      bezt=sort(buck)[2]+1}
   return(bezt)}

which led to a most surprising outcome:

> petifil(hor=2)
[1] 1.333
> petifil(hor=3)
[1] 1.5
> petifil(hor=4)
[1] 1.5
> petifil(hor=5)
[1] 1.5

that is no feasible strategy to beat the value 1.5 liters. Which actually stands way below two liters, the maximum content of the bucket produced in the solution!

controlled SMC

Posted in Books, pictures, Statistics, University life with tags , , , , , on December 18, 2017 by xi'an

At the end of [last] August, Jeremy Heng, Adrian Bishop†, George Deligiannidis and Arnaud Doucet arXived a paper on controlled sequential Monte Carlo (SMC). That we read today at the BiPs reading group in Paris-Saclay, when I took these notes. The setting is classical SMC, but with a twist in that the proposals at each time iteration are modified by an importance function. (I was quite surprised to discover that this was completely new in that I was under the false impression that it had been tried ages ago!) This importance sampling setting can be interpreted as a change of measures on both the hidden Markov chain and on its observed version. So that the overall normalising constant remains the same. And then being in an importance sampling setting there exists an optimal choice for the importance functions. That results in a zero variance estimated normalising constant, unsurprisingly. And the optimal solution is actually the backward filter familiar to SMC users.

A large part of the paper actually concentrates on figuring out an implementable version of this optimal solution. Using dynamic programming. And projection of each local generator over a simple linear space with Gaussian kernels (aka Gaussian mixtures). Which becomes feasible through the particle systems generated at earlier iterations of said dynamic programming.

The paper is massive, both in terms of theoretical results and of the range of simulations, and we could not get through it within the 90 minutes Sylvain LeCorff spent on presenting it. I can only wonder at this stage how much Rao-Blackwellisation or AMIS could improve the performances of the algorithm. (A point I find quite amazing in Proposition 1 is that the normalising constant Z of the filtering distribution does not change along observations when using the optimal importance function, which translates into the estimates being nearly constant after a few iterations.)

Sean Meyn in Paris

Posted in Books, Statistics, Travel with tags , , , , , , , on November 23, 2013 by xi'an

My friend Sean Meyn (from the University of Florida, Gainesville) will give a talk in Paris next week (and I will be away in Coventry at the time…). Here are the details:

Mardi 26 novembre 2013 à 14h00
Salle de Conseil, 4ème étage (LINCS) 23 AVENUE D’ITALIE 75013 PARIS

Titre de l’exposé : Feature Selection for Neuro-Dynamic Programming

Neuro-Dynamic Programming encompasses techniques from both reinforcement learning and approximate dynamic programming. Feature selection refers to the choice of basis that defines the function class that is required in the application of these techniques. This talk reviews two popular approaches to neuro-dynamic programming, TD-learning and Q-learning. The main goal of this work is to demonstrate how insight from idealized models can be used as a guide for feature selection for these algorithms. Several approaches are surveyed, including fluid and diffusion models, and the application of idealized models arising from mean-field game approximations. The theory is illustrated with several examples.