Archive for R

a secretary problem with maximum ability

Posted in Kids, R with tags , , , , on April 28, 2017 by xi'an

The Riddler of today has a secretary problem, where one measures sequentially N random variables until one deems the current variable to be the largest of the whole sample. The classical secretary problem has a counter-intuitive solution where one first measures N/e random variables without taking any decision and then and only then picks the first next outcome larger than the largest in the first group. (For large values of N.) The added information in the current riddle is that the distribution of those iid random variables is set to be uniform on {1,…,M}, which begs for a modification in the algorithm. As for instance when observing M on the current draw.

The approach I devised is certainly suboptimal, as I decided to pick the currently observed value if the (conditional) probability it is the largest is larger than the probability subsequent draws. This translates into the following R code:

M=100 #maximum value
N=10  #total number of draws
hyprob=function(m){
# m is sequence of draws so far
n=length(m);mmax=max(m)
if ((m[n]<mmax)||(mmax-n<N-n)){prob=0
  }else{
  prob=prod(sort((1:mmax)[-m],dec=TRUE)
   [1:(N-n)]/((M-n):(M-N+1))}
return(prob)}

decision=function(draz=sample(1:M,N)){
  i=0
  keepgoin=TRUE
  while ((keepgoin)&(i<N)){
   i=i+1
   keepgoin=(hyprob(draz[1:i])<0.5)}
  return(c(i,draz[i],(draz[i]<max(draz))))}

which produces a winning rate of around 62% when N=10 and M=100, hence much better than the expected performances of the (asymptotic) secretary algorithm, with a winning frequency of 1/e. (For N=10 and M=100, the winning frequency is only 27%.)

Le Monde puzzle [#1003]

Posted in Kids, R with tags , , , on April 18, 2017 by xi'an

A purely arithmetic Le Monde mathematical puzzle:

Find the four integers w, x, y, z such that the four smallest pairwise sums among the six pairwise sums are 59, 65, 66, and 69. Similarly, find the four smallest of the five integers v, x, y, z such that the five smallest pairwise sums among the ten pairwise sums are 56, 64 , 66, 69 and 70.

The first question is rather straightforward since there are only two possible orderings when x≤y≤z≤w :

x+y≤x+z≤x+w≤y+z≤y+w≤z+w and x+y≤x+z≤y+z≤x+w≤y+w≤z+w

which means

x+y=59, x+z=65, x+w=66, y+z=69, and x+y=59, x+z=65, y+z=66, x+w=69

but since the first system does not allow for an integer solution, the only possibility is the second system, with solution (x,y,z,w)=(29,30,36,40). And the second question is of the same complexity, with, when x≤y≤z≤w≤v :

x+y=56, x+z=64, y+z=66, x+w=69, x+v=70 or x+y=56, x+z=64, x+w=66, x+v=69, y+z=70

with solutions (x,y,z,w,v)=(27,29,37,42,43) and (x,y,z,w,v)=(25,31,39,41,44).

optimultiplication [a riddle]

Posted in Books, Kids, R, Statistics with tags , , , , , on April 14, 2017 by xi'an

The riddle of this week is about an optimisation of positioning the four digits of a multiplication of two numbers with two digits each and is open to a coding resolution:

Four digits are drawn without replacement from {0,1,…,9}, one at a time. What is the optimal strategy to position those four digits, two digits per row, as they are drawn, toward minimising the average product?

Although the problem can be solved algebraically by computing E[X⁴|x¹,..] and E[X⁴X³|x¹,..]  I wrote three R codes to “optimise” the location of the first three digits: the first digit ends up as a unit if it is 5 or more and a multiple of ten otherwise, on the first row. For the second draw, it is slightly more variable: with this R code,

second<-function(i,j,N=1e5){draw
drew=matrix(0,N,2)
for (t in 1:N)
  drew[t,]=sample((0:9)[-c(i+1,j+1)],2)
conmean=(45-i-j)/8
conprod=mean(drew[,1]*drew[,2])
if (i<5){ #10*i
 pos=c((110*i+11*j)*conmean,
       100*i*j+10*(i+j)*conmean+conprod,
       (100*i+j)*conmean+10*i*j+10*conprod)}else{
 pos=c((110*j+11*i)*conmean,
       10*i*j+(100*j+i)*conmean+10*conprod,
       10*(i+j)*conmean+i*j+100*conprod)
 return(order(pos)[1])}

the resulting digit again ends up as a unit if it is 5 (except when x¹=7,8,9, where it is 4) or more and a multiple of ten otherwise, but on the second row. Except when x¹=0, x²=1,2,3,4, when they end up on the first row together, 0 obviously in front.

For the third and last open draw, there is only one remaining random draw, which mean that the decision only depends on x¹,x²,x³ and E[X⁴|x¹,x²,x³]=(45-x¹-x²-x³)/7. Attaching x³ to x² or x¹ will then vary monotonically in x³, depending on whether x¹>x² or x¹<x²:

fourth=function(i,j,k){
comean=(45-i-j-k)/7
if ((i<1)&(j<5)){ pos=c(10*comean+k,comean+10*k)}
if ((i<5)&(j>4)){ pos=c(100*i*comean+k*j,j*comean+100*i*k)}
if ((i>0)&(i<5)&(j<5)){ pos=c(i*comean+k*j,j*comean+i*k)} 
if ((i<7)&(i>4)&(j<5)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)} 
if ((i<7)&(i>4)&(j>4)){ pos=c(i*comean+k*j,j*comean+i*k)}
if ((i>6)&(j<4)){ pos=c(i*comean+100*k*j,j*comean+100*i*k)} 
if ((i>6)&(j>3)){ pos=c(i*comean+k*j,j*comean+i*k)}
return(order(pos)[1])}

Running this R code for all combinations of x¹,x² shows that, except for the cases x¹≥5 and x²=0, for which x³ invariably remains in front of x¹, there are always values of x³ associated with each position.

Statlearn17, Lyon

Posted in Kids, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , on April 6, 2017 by xi'an

Today and tomorrow, I am attending the Statlearn17 conference in Lyon, France. Which is a workshop with one-hour talks on statistics and machine learning. And which makes for the second workshop on machine learning in two weeks! Yesterday there were two tutorials in R, but I only took the train to Lyon this morning: it will be a pleasant opportunity to run tomorrow through a city I have not truly ever visited, if X’ed so many times driving to the Alps. Interestingly, the trip started in Paris with me sitting in the train next to another speaker at the conference, despite having switched seat and carriage with another passenger! Speaker whom I did not know beforehand and could only identify him by his running R codes at 300km/h.

Le Monde puzzle [#1002]

Posted in Kids, R with tags , , , , , , , on April 4, 2017 by xi'an

For once and only because it is part of this competition, a geometric Le Monde mathematical puzzle:

Given both diagonals of lengths p=105 and q=116, what is the parallelogram with the largest area? and when the perimeter is furthermore constrained to be L=290?

This made me jump right away to the quadrilateral page on Wikipedia, which reminds us that the largest area occurs when the diagonals are orthogonal, in which case it is A=½pq. Only the angle between the diagonals matters. Imposing the perimeter 2s in addition is not solved there, so I wrote an R code looking at all the integer solutions, based on one of the numerous formulae for the area, like ½pq sin(θ), where θ is the angle between both diagonals, and discretising in terms of the fractions of both diagonals at the intersection, and of the angle θ:

p=105
q=116
s=145
for (alpha in (1:500)/1000){
 ap=alpha*p;ap2=ap^2;omap=p-ap;omap2=omap^2
 for (beta in (1:999)/1000){
  bq=beta*q;bq2=bq^2;ombq=q-bq;ombq2=ombq^2
  for (teta in (1:9999)*pi/10000){
   d=sqrt(ap2+bq2-2*ap*bq*cos(teta))
   a=sqrt(ap2+ombq2+2*ap*ombq*cos(teta))
   b=sqrt(omap2+ombq2-2*omap*ombq*cos(teta))
   c=sqrt(omap2+bq2+2*omap*bq*cos(teta))
   if (abs(a+b+c+d-2*s)<.01){
     if (p*q*sin(teta)<2*maxur){
       maxur=p*q*sin(teta)/2
       sole=c(a,b,c,d,alpha,beta,teta)}}}}

This code returned an area of 4350, to compare with the optimal 6090 (which is recovered by the above R code when the diagonal lengths are identical and the perimeter is the one of the associated square). (As Jean-Louis Foulley pointed out to me, this area can be found directly by assuming the quadrilateral is a parallelogram and maximising in the length of one side.)

Le Monde puzzle [#1000…1025]

Posted in Kids, R with tags , , , , , , on March 28, 2017 by xi'an

Le Monde mathematical puzzle launched a competition to celebrate its 1000th puzzle! A fairly long-term competition as it runs over the 25 coming puzzles (and hence weeks). Starting with puzzle #1001. Here is the 1000th puzzle, not part of the competition:

Alice & Bob spend five (identical) vouchers in five different shops, each time buying the maximum number of items to get close to the voucher value. In these five shops, they buy sofas at 421 euros each, beds at 347 euros each, kitchen appliances at 289 euros each, tables at 251 euros each and bikes at 211 euros each, respectively. Once the buying frenzy is over, they realise that within a single shop, they would have spent exactly four vouchers for the same products. What is the value of a voucher?

Le Monde puzzle [#1001]

Posted in Kids, R with tags , , , , on March 27, 2017 by xi'an

After a long lag (due to my missing the free copies distributed at Paris-Dauphine!), here is a Sudoku-like Le Monde mathematical puzzle:

A grid of size (n,n) holds integer values such that any entry larger than 1 is the sum of one term in the same column and one term in the same row. What is the maximal possible value observed in such a grid when n=3,4?

This can be solved in R by a random exploration of such possible grids in a simulated annealing spirit:

mat=matrix(1,N,N)
goal=1

targ=function(mat){ #check constraints
  d=0
  for (i in (1:(N*N))[mat>1]){
    r=(i-1)%%N+1;c=(i-1)%/%N+1
    d=d+(min(abs(mat[i]-outer(mat[-r,c],mat[r,-c],"+")))>0)} 
  return(d)}

cur=0
for (t in 1:1e6){
  i=sample(1:(N*N),1);prop=mat
  prop[i]=sample(1:(2*goal),1)
  d=targ(prop)
  if (10*log(runif(1))/t<cur-d){ 
      mat=prop;cur=d} 
  if ((d==0)&(max(prop)>goal)){
     goal=max(prop);maxx=prop}}

returning a value of 8 for n=3 and 37 for n=4. However, the method is quite myopic and I tried instead a random filling of the grid, using each time the maximum possible sum for empty cells:

goal=1
for (v in 1:1e6){
  mat=matrix(0,N,N)
  #one 1 per row/col
  for (i in 1:N) mat[i,sample(1:N,1)]=1
  for (i in 1:N) if (max(mat[,i])==0) mat[sample(1:N,1),i]=1
  while (min(mat)==0){
   parm=sample(1:(N*N)) #random order
   for (i in parm[mat[parm]==0]){
    r=(i-1)%%N+1;c=(i-1)%/%N+1
    if ((max(mat[-r,c])>0)&(max(mat[r,-c])>0)){
      mat[i]=max(mat[-r,c])+max(mat[r,-c])
      break()}}}
    if (goal<max(mat)){
    goal=max(mat);maxx=mat}}

which recovered a maximum of 8 for n=3, but reached 48 for n=4. And 211 for n=5, 647 for n=6… For instance, here is the solution for n=4:

[1,]    1    5   11   10
[2,]    2    4    1    5
[3,]   48    2   24    1
[4,]   24    1   22   11

Continue reading