Archive for R

splitting a field by annealing

Posted in Kids, pictures, R, Statistics with tags , , , , , , , , on October 18, 2017 by xi'an

A recent riddle [from The Riddle] that I pondered about during a [long!] drive to Luxembourg last weekend was about splitting a square field into three lots of identical surface for a minimal length of separating wire… While this led me to conclude that the best solution was a T like separation, I ran a simulated annealing R code on my train trip to AutransValence, seemingly in agreement with this conclusion.I discretised the square into n² units and explored configurations by switching two units with different colours, according to a simulated annealing pattern (although unable to impose connectivity on the three regions!):

partz=matrix(1,n,n)
partz[,1:(n/3)]=2;partz[((n/2)+1):n,((n/3)+1):n]=3
#counting adjacent units of same colour 
nood=hood=matrix(4,n,n)
for (v in 1:n2) hood[v]=bourz(v,partz)
minz=el=sum(4-hood)
for (t in 1:T){
  colz=sample(1:3,2) #picks colours
  a=sample((1:n2)[(partz==colz[1])&(hood<4)],1)
  b=sample((1:n2)[(partz==colz[2])&(hood<4)],1) 
  partt=partz;partt[b]=colz[1];partt[a]=colz[2] 
#collection of squares impacted by switch 
  nood=hood 
  voiz=unique(c(a,a-1,a+1,a+n,a-n,b-1,b,b+1,b+n,b-n)) 
  voiz=voiz[(voiz>0)&(voiz<n2)] 
  for (v in voiz) nood[v]=bourz(v,partt) 
  if (nood[a]*nood[b]>0){
    difz=sum(nood)-sum(hood)
    if (log(runif(1))<difz^3/(n^3)*(1+log(10*rep*t)^3)){
      el=el-difz;partz=partt;hood=nood     
      if (el<minz){ minz=el;cool=partz}
  }}}

(where bourz computes the number of neighbours), which produces completely random patterns at high temperatures (low t) and which returns to the T configuration (more or less):if not always, as shown below:Once the (a?) solution was posted on The Riddler, it appeared that one triangular (Y) version proved better than the T one [if not started from corners], with a gain of 3% and that a curved separation was even better with an extra gain less than 1% [solution that I find quite surprising as straight lines should improve upon curved ones…]

Astrostatistics school

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on October 17, 2017 by xi'an

What a wonderful week at the Astrostat [Indian] summer school in Autrans! The setting was superb, on the high Vercors plateau overlooking both Grenoble [north] and Valence [west], with the colours of the Fall at their brightest on the foliage of the forests rising on both sides of the valley and a perfect green on the fields at the centre, with sun all along, sharp mornings and warm afternoons worthy of a late Indian summer, too many running trails [turning into X country ski trails in the Winter] to contemplate for a single week [even with three hours of running over two days], many climbing sites on the numerous chalk cliffs all around [but a single afternoon for that, more later in another post!]. And of course a group of participants eager to learn about Bayesian methodology and computational algorithms, from diverse [astronomy, cosmology and more] backgrounds, trainings and countries. I was surprised at the dedication of the participants travelling all the way from Chile, Péru, and Hong Kong for the sole purpose of attending the school. David van Dyk gave the first part of the school on Bayesian concepts and MCMC methods, Roberto Trotta the second part on Bayesian model choice and hierarchical models, and myself a third part on, surprise, surprise!, approximate Bayesian computation. Plus practicals on R.

As it happens Roberto had to cancel his participation and I turned for a session into Christian Roberto, presenting his slides in the most objective possible fashion!, as a significant part covered nested sampling and Savage-Dickey ratios, not exactly my favourites for estimating constants. David joked that he was considering postponing his flight to see me talk about these, but I hope I refrained from engaging into controversy and criticisms… If anything because this was not of interest for the participants. Indeed when I started presenting ABC through what I thought was a pedestrian example, namely Rasmus Baath’s socks, I found that the main concern was not running an MCMC sampler or a substitute ABC algorithm but rather an healthy questioning of the construction of the informative prior in that artificial setting, which made me quite glad I had planned to cover this example rather than an advanced model [as, e.g., one of those covered in the packages abc, abctools, or abcrf]. Because it generated those questions about the prior [why a Negative Binomial? why these hyperparameters? &tc.] and showed how programming ABC turned into a difficult exercise even in this toy setting. And while I wanted to give my usual warning about ABC model choice and argue for random forests as a summary selection tool, I feel I should have focussed instead on another example, as this exercise brings out so clearly the conceptual difficulties with what is taught. Making me quite sorry I had to leave one day earlier. [As did missing an extra run!] Coming back by train through the sunny and grape-covered slopes of Burgundy hills was an extra reward [and no one in the train commented about the local cheese travelling in my bag!]

 

Le Monde puzzle [#1024]

Posted in Books, Kids with tags , , , , , , , on October 10, 2017 by xi'an

The penultimate and appropriately somewhat Monty Hallesque Le Monde mathematical puzzle of the competition!

A dresser with 5×5 drawers contains a single object in one of the 25 drawers. A player opens a drawer at random and, after each choice, the object moves at random to a drawer adjacent to its current location and the drawer chosen by the player remains open. What is the maximum number of drawers one need to open to find the object?

In a dresser with 9 drawers in a line, containing again a single object, the player opens drawers one at a time, after which the open drawer is closed and the object moves to one of the drawers adjacent to its current location. What is the maximum number of drawers one need to open to find the object?

For the first question, setting a pattern of exploration and, given this pattern, simulating a random walk trying to avoid the said pattern as long as possible is feasible, returning a maximum number of steps over many random walks [and hence a lower bound on the true maximum]. As in the following code

sefavyd=function(pater=seq(1,49,2)%%25+1){
  fild=matrix(0,5,5)
  m=pater[1];i=fild[m]=1
  t=sample((1:25)[-m],1)
  nomove=FALSE
  while (!nomove){
   i=i+1
   m=pater[i];fild[m]=1
   if (t==m){ nomove=TRUE}else{
   muv=NULL
   if ((t-1)%%5>0) muv=c(muv,t-1)
   if (t%%5>0) muv=c(muv,t+1)
   if ((t-1)%/%5>0) muv=c(muv,t-5)
   if (t%/%5<4) muv=c(muv,t+5)
   muv=muv[fild[muv]==0]
   nomove=(length(muv)==0)
   if (!nomove) t=sample(rep(muv,2),1)}
  }
  return(i)}

But a direct reasoning starts from the observation that, while two adjacent drawers are not opened, a random walk can, with non-zero probability, switch indefinitely between both drawers. Hence, a sure recovery of the object requires opening one drawer out of two. The minimal number of drawers to open on a 5×5 dresser is 2+3+2+3+2=12. Since in 12 steps, those drawers are all open, spotting the object may require up to 13 steps.

For the second case, unless I [again!] misread the question, whatever pattern one picks for the exploration, there is always a non-zero probability to avoid discovery after an arbitrary number of steps. The [wrong!] answer is thus infinity. To cross-check this reasoning, I wrote the following R code that mimics a random pattern of exploration, associated by an opportunistic random walk that avoids discovery whenever possible (even with very low probability) bu pushing the object towards the centre,

drawl=function(){
  i=1;t=5;nomove=FALSE
  m=sample((1:9)[-t],1)
  while (!nomove){
    nextm=sample((1:9),1)
    muv=c(t-1,t+1)
    muv=muv[(muv>0)&(muv<10)&(muv!=nextm)] 
    nomove=(length(muv)==0)||(i>1e6)
    if (!nomove) t=sample(rep(muv,2),1,
              prob=1/(5.5-rep(muv,2))^4)
    i=i+1}
  return(i)}

which returns unlimited values on repeated runs. However, I was wrong and the R code unable to dismiss my a priori!, as later discussions with Robin and Julien at Paris-Dauphine exhibited ways of terminating the random walk in 18, then 15, then 14 steps! The idea was to push the target to one of the endpoints because it would then have no option but turning back: an opening pattern like 2, 3, 4, 5, 6, 7, 8, 8 would take care of a hidden object starting in an even drawer, while the following 7, 6, 5, 4, 3, 2 openings would terminate any random path starting from an odd drawer. To double check:

grawl=function(){
  len=0;muvz=c(3:8,8:1)
  for (t in 1:9){
    i=1;m=muvz[i];nomove=(t==m)
    while (!nomove){
     i=i+1;m=muvz[i];muv=c(t-1,t+1)
     muv=muv[(muv>0)&(muv<10)&(muv!=m)]
     nomove=(length(muv)==0)
     if (!nomove)
      t=sample(rep(muv,2),1)}
    len=max(len,i)}
  return(len)}

produces the value 14.

[summer Astrostat school] room with a view [jatp]

Posted in Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , on October 9, 2017 by xi'an

I just arrived in Autrans, on the Plateau du Vercors overlooking Grenoble and the view is fabulistic! Trees have started to turn red and yellow, the weather is very mild, and my duties are restricted to teaching ABC to a group of enthusiastic astronomers and cosmologists..! Second advanced course on ABC in the mountains this year, hard to beat (except by a third course). The surroundings are so serene and peaceful that I even conceded to install RStudio for my course! Instead of sticking to my favourite vim editor and line commands.

Le Monde puzzle [#1021]

Posted in Books, Kids, R with tags , , , , , on September 17, 2017 by xi'an

A puzzling Le Monde mathematical puzzle for which I could find no answer in the allotted time!:

A most democratic electoral system allows every voter to have at least one representative by having each of the N voters picking exactly m candidates among the M running candidates and setting the size n of the representative council towards this goal, prior to the votes. If there are M=25 candidates, m=10 choices made by the voters, and n=10 representatives, what is the maximal possible value of N? And if N=55,555 and M=33, what is the minimum value of n for which m=n is always possible?

I tried a brute force approach by simulating votes from N voters at random and attempting to find the minimal number of councillors for this vote, which only provides an upper bound of the minimum [for one vote], and a lower bound in the end [over all votes]. Something like

for (i in 1:N) votz[i,]=sample(1:M,n)
#exploration by majority
  remz=1:N;conz=NULL
  while (length(remz)>0){
    seatz=order(-hist(votz[remz,],
    breaks=(0:M)+0.5,plot=FALSE)$density)[1]
    conz=c(conz,seatz);nuremz=NULL
    for (v in remz)
      if (!(seatz%in%votz[v,])) nuremz=c(nuremz,v)
    remz=nuremz}
  solz=length(conz)
#exploration at random
   kandz=matrix(0,N,M)
   for (i in 1:N) kandz[i,votz[i,]]=1
   for (t in 1:1e3){
#random choice of councillors
    zz=sample(c(0,1),M,rep=TRUE)
    while (min(kandz%*%zz)!=1)
      zz=sample(c(0,1),M,rep=TRUE)
    solz=min(solz,sum(zz))
#random choice of remaining councillor per voter
    remz=1:N;conz=NULL
    while (length(remz)>0){
      seatz=sample(votz[remz[1],],1)
      conz=c(conz,seatz);nuremz=NULL
      for (i in remz)
        if (!(seatz%in%votz[i,])) nuremz=c(nuremz,i)
      remz=nuremz}
    solz=min(solz,length(conz))}
maxz=max(solz,maxz)}

which leads to a value near N=4050 for the first question, with 0% confidence… Obviously, the problem can be rephrased as a binary integer linear programming problem of the form

n= \max_A \min_{c;\,\min Ac=1}\mathbf{1}^\text{T}c

where A is the NxM matrix of votes and c is the vector of selected councillors. But I do not see a quick way to fix it!

Le Monde puzzle [#1019]

Posted in Books, Kids with tags , , , , , , on September 7, 2017 by xi'an

A gamey (and verbose) Le Monde mathematical puzzle:

A two-player game involves n+2 cards in a row, blue on one side and red on the other. Each player can pick any blue card among the n first ones and flip it plus both following ones. The game stops when no blue card is left to turn. The gain for the last player turning cards is 20-t, where t is the number of times cards were flipped, with gain t for its opponent. Both players aim at maximising their gain.

1. When n=4 and all cards are blue, can the first player win? If not, what is the best score for this player?

2. Among all 16 configurations at start, how many lead to the first player to win?

3. When n=10 and all cards are blue, how many cards are flipped an odd number of times for the winning configuration?

The first two questions can easily be processed by an R code like the following recursive function:

liplop <- function(x,n,i){
  if (max(x[1:n])==0){
    return(i)
  }else{
    sol=NULL
    for (j in (1:n)[x[1:n]==1]){
      y=x;y[j:(j+2)]=1-y[j:(j+2)]
      sol=c(sol,20-liplop(y,n,i+1))}
    return(max(sol))}}

Returning

> liplop(rep(1,6),4,0)
[1] 6

Meaning the first player cannot win, by running at most six rounds. Calling the same function for all 4⁴=16 possible configurations leads to 8 winning ones:

[1] 0 0 0 1
[1] 0 0 1 1
[1] 0 1 0 1
[1] 0 1 1 1
[1] 1 0 0 0
[1] 1 0 1 0
[1] 1 1 0 0
[1] 1 1 1 0

Solving the same problem with n=10 is not feasible with this function. (Even n=6 seems out of reach!)

Le Monde puzzle [#1018]

Posted in Books, Kids, R with tags , , , , , on August 29, 2017 by xi'an

An arithmetic Le Monde mathematical puzzle (that first did not seem to involve R programming because of the large number of digits in the quantity involved):

An integer x with less than 100 digits is such that adding the digit 1 on both sides of x produces the integer 99x.  What are the last nine digits of x? And what are the possible numbers of digits of x?

The integer x satisfies the identity

10^{\omega+2}+10x+1=99x

where ω is the number of digits of x. This amounts to

10….01 = 89 x,

where there are ω zeros. Working with long integers in R could bring an immediate solution, but I went for a pedestrian version, handling each digit at a time and starting from the final one which is necessarily 9:

#multiply by 9
rap=0;row=NULL
for (i in length(x):1){
prud=rap+x[i]*9
row=c(prud%%10,row)
rap=prud%/%10}
row=c(rap,row)
#multiply by 80
rep=raw=0
for (i in length(x):1){
prud=rep+x[i]*8
raw=c(prud%%10,raw)
rep=prud%/%10}
#find next digit
y=(row[1]+raw[1]+(length(x)>1))%%10

returning

7 9 7 7 5 2 8 0 9

as the (only) last digits of x. The same code can be exploited to check that the complete multiplication produces a number of the form 10….01, hence to deduce that the length of x is either 21 or 65, with solutions

[1] 1 1 2 3 5 9 5 5 0 5 6 1 7 9 7 7 5 2 8 0 9
[1] 1 1 2 3 5 9 5 5 0 5 6 1 7 9 7 7 5 2 8 0 8 9 8 8 7 6 4 0 4 4 9 4 3 8 2 0 2 2
[39] 4 7 1 9 1 0 1 1 2 3 5 9 5 5 0 5 6 1 7 9 7 7 5 2 8 0 9

The maths question behind is to figure out the powers k of 10 such that

10^k\equiv -1 \text{ mod } (89)

For instance, 10²≡11 mod (89) and 11¹¹≡88 mod (89) leads to the first solution ω=21. And then, since 10⁴⁴≡1 mod (89), ω=21+44=65 is another solution…