Archive for simulated annealing

Le Monde puzzle [#1127]

Posted in Books, Kids, R, Statistics with tags , , , , on January 17, 2020 by xi'an

A permutation challenge as Le weekly Monde current mathematical puzzle:

When considering all games between 20 teams, of which 3 games have not yet been played, wins bring 3 points, losses 0 points, and draws 1 point (each). If the sum of all points over all teams and all games is 516, was is the largest possible number of teams with no draw in every game they played?

The run of a brute force R simulation of 187 purely random games did not produce enough acceptable tables in a reasonable time. So I instead considered that a sum of 516 over 187 games means solving 3a+2b=516 and a+b=187, leading to 142 3’s to allocate and 45 1’s. Meaning for instance this realisation of an acceptable table of game results

games=matrix(1,20,20);diag(games)=0
while(sum(games*t(games))!=374){
  games=matrix(1,20,20);diag(games)=0
  games[sample((1:20^2)[games==1],3)]=0}
games=games*t(games)
games[lower.tri(games)&games]=games[lower.tri(games)&games]*
    sample(c(rep(1,45),rep(3,142)))* #1's and 3'
    (1-2*(runif(20*19/2-3)<.5)) #sign
games[upper.tri(games)]=-games[lower.tri(games)]
games[games==-3]=0;games=abs(games)

Running 10⁶ random realisations of such matrices with no constraint whatsoever provided a solution with] 915,524 tables with no no-draws, 81,851 tables with 19 teams with some draws, 2592 tables with 18 with some draws and 3 tables with 17 with some draws. However, given that 10*9=90 it seems to me that the maximum number should be 10 by allocating all 90 draw points to the same 10 teams, and 143 3’s at random in the remaining games, and I reran a simulated annealing version (what else?!), reaching a maximum 6 teams with no draws. Nowhere near 10, though!

riddle on a circle

Posted in Books, Kids, R, Travel with tags , , , , , , , on December 22, 2019 by xi'an

The Riddler’s riddle this week provides another opportunity to resort to brute-force simulated annealing!

Given a Markov chain defined on the torus {1,2,…,100} with only moves a drift to the right (modulo 100) and a uniformely random jump, find the optimal transition matrix to reach 42 in a minimum (average) number of moves.

Which I coded in my plane to Seattle, under the assumption that there is nothing to do when the chain is already in 42. And the reasoning that there is not gain (on average) in keeping the choice between right shift and random jump random.

dure=min(c(41:0,99:42),50)
temp=.01
for (t in 1:1e6){
  i=sample((1:100)[-42],1)
  dura=1+mean(dure)
  if (temp*log(runif(1))<dure[i]-dura) dure[i]=dura
  if(temp*log(runif(1))<dure[i]-(dura<-1+dure[i*(i<100)+1])) 
    dure[i]=dura 
  temp=temp/(1+.1e-4*(runif(1)>.99))}

In all instances, the solution is to move at random for any position but those between 29 and 41, for an average 13.64286 number of steps to reach 42. (For values outside the range 29-42.)

Galton’s board all askew

Posted in Books, Kids, R with tags , , , , , , , on November 19, 2019 by xi'an

Since Galton’s quincunx has fascinated me since the (early) days when I saw a model of it as a teenager in an industry museum near Birmingham, I jumped on the challenge to build an uneven nail version where the probabilities to end up in one of the boxes were not the Binomial ones. For instance,  producing a uniform distribution with the maximum number of nails with probability ½ to turn right. And I obviously chose to try simulated annealing to figure out the probabilities, facing as usual the unpleasant task of setting the objective function, calibrating the moves and the temperature schedule. Plus, less usually, a choice of the space where the optimisation takes place, i.e., deciding on a common denominator for the (rational) probabilities. Should it be 2⁸?! Or more (since the solution with two levels also involves 1/3)? Using the functions

evol<-function(P){
  Q=matrix(0,7,8)
  Q[1,1]=P[1,1];Q[1,2]=1-P[1,1]
  for (i in 2:7){
    Q[i,1]=Q[i-1,1]*P[i,1]
    for (j in 2:i)
      Q[i,j]=Q[i-1,j-1]*(1-P[i,j-1])+Q[i-1,j]*P[i,j]
    Q[i,i+1]=Q[i-1,i]*(1-P[i,i])
    Q[i,]=Q[i,]/sum(Q[i,])}
  return(Q)}

and

temper<-function(T=1e3){
  bestar=tarP=targ(P<-matrix(1/2,7,7))
  temp=.01
  while (sum(abs(8*evol(R.01){
    for (i in 2:7)
      R[i,sample(rep(1:i,2),1)]=sample(0:deno,1)/deno
    if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
    for (i in 2:7) R[i,1:i]=(P[i,1:i]+P[i,i:1])/2
    if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
    if (runif(1)<1e-4) temp=temp+log(T)/T}
  return(P)}

I first tried running my simulated annealing code with a target function like

targ<-function(P)(1+.1*sum(!(2*P==1)))*sum(abs(8*evol(P)[7,]-1))

where P is the 7×7 lower triangular matrix of nail probabilities, all with a 2⁸ denominator, reaching

60
126 35
107 81 20
104 71 22 0
126 44 26 69 14
61 123 113 92 91 38
109 60 7 19 44 74 50

for 128P. With  four entries close to 64, i.e. ½’s. Reducing the denominator to 16 produced once

8
12 1
13 11 3
16  7  6   2
14 13 16 15 0
15  15  2  7   7  4
    8   0    8   9   8  16  8

as 16P, with five ½’s (8). But none of the solutions had exactly a uniform probability of 1/8 to reach all endpoints. Success (with exact 1/8’s and a denominator of 4) was met with the new target

(1+,1*sum(!(2*P==1)))*(.01+sum(!(8*evol(P)[7,]==1)))

imposing precisely 1/8 on the final line. With a solution with 11 ½’s

0.5
1.0 0.0
1.0 0.0 0.0
1.0 0.5 1.0 0.5
0.5 0.5 1.0 0.0 0.0
1.0 0.0 0.5 0.0 0.5 0.0
0.5 0.5 0.5 1.0 1.0 1.0 0.5

and another one with 12 ½’s:

0.5
1.0 0.0
1.0 .375 0.0
1.0 1.0 .625 0.5
0.5  0.5  0.5  0.5  0.0
1.0  0.0  0.5  0.5  0.0  0.5
0.5  1.0  0.5  0.0  1.0  0.5  0.0

Incidentally, Michael Proschan and my good friend Jeff Rosenthal have an 2009 American Statistician paper on another modification of the quincunx they call the uncunx! Playing a wee bit further with the annealing, and using a denominator of 840 let to a 60P  with 13 ½’s out of 28

30
60 0
60 1 0
30 30 30 0
30 30 30 30 30
60  60  60  0  60  0
60  30  0  30  30 60 30

stochastic magnetic bits, simulated annealing and Gibbs sampling

Posted in Statistics with tags , , , , , , , , , on October 17, 2019 by xi'an

A paper by Borders et al. in the 19 September issue of Nature offers an interesting mix of computing and electronics and optimisation. With two preparatory tribunes! One [rather overdone] on Feynman’s quest. As a possible alternative to quantum computers for creating probabilistic bits. And making machine learning (as an optimisation program) more efficient. And another one explaining more clearly what is in the paper. As well as the practical advantage of the approach over quantum computing. As for the paper itself, the part I understood about factorising an integer F via minimising the squared difference between a product of two integers and F and using simulated annealing sounded rather easy, while the part I did not about constructing a semi-conductor implementing this stochastic search sounded too technical (especially in the métro during rush hour). Even after checking the on-line supplementary material. Interestingly, the paper claims for higher efficiency thanks to asynchronicity than a regular Gibbs simulation of Boltzman machines, quoting Roberts and Sahu (1997) without further explanation and possibly out of context (as the latter is not concerned with optimisation).

ABC-SAEM

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on October 8, 2019 by xi'an

In connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

Le Monde puzzle [#1092]

Posted in Statistics with tags , , , , , , , on April 18, 2019 by xi'an

A Latin square Le Monde mathematical puzzle that I found rather dreary:

A hidden 3×3 board contains all numbers from 1 to 9. Anselm wants to guess the board and makes two proposals. Berenicke tells him how many entries are in the right rows and colums for each proposal, along with the information that no entry is at the right location. Anselm deduces the right board.

Which I solved by brute force and not even simulated annealing, first defining a target

ordoku1=ordoku2=matrix(1,9,2)
ordoku1[,1]=c(1,1,1,2,2,2,3,3,3)
ordoku1[,2]=rep(1:3,3)
ordoku2[,1]=c(3,2,3,1,2,3,2,1,1)
ordoku2[,2]=c(2,2,3,2,3,1,1,3,1)
fitz=function(ordo){
  (sum(ordo[c(1,4,7),2]==1)==1)+(sum(ordo[c(2,5,8),2]==2)==1)+
  (sum(ordo[c(3,6,9),2]==3)==0)+(sum(ordo[c(1,2,3),1]==1)==1)+
  (sum(ordo[c(4,5,6),1]==2)==1)+(sum(ordo[c(7,8,9),1]==3)==2)+
  (sum(ordo[c(6,7,9),2]==1)==2)+(sum(ordo[c(1,2,4),2]==2)==1)+
  (sum(ordo[c(3,5,8),2]==3)==2)+(sum(ordo[c(4,8,9),1]==1)==1)+
  (sum(ordo[c(7,2,5),1]==2)==1)+(sum(ordo[c(1,3,6),1]==3)==0)+
  (!(0%in%apply((ordo-ordoku1)^2,1,sum)))+(!(0%in%apply((ordo-ordoku2)^2,1,sum)))
}

on a 9×9 board entry reproducing all items of information given by Berenicke. If all constraints are met, the function returns 14. And then searched for a solution at random:

temp=1
randw=function(ordo){
  for (t in 1:1e6){
    chlg=sample(1:9,2)
    temp=ordo[chlg[1],]
    ordo[chlg[1],]=ordo[chlg[2],]
    ordo[chlg[2],]=temp
    if (fitz(ordo)==14){
      print(ordo);break()}}}

which produces the correct board

4 3 5
6 7 1
9 2 8

Le Monde puzzle [#1088]

Posted in Books, Kids, R with tags , , , , , , , , on March 29, 2019 by xi'an

A board (Ising!) Le Monde mathematical puzzle in the optimisation mode, again:

On a 7×7 board, what is the maximal number of locations that one can occupy when imposing at least two empty neighbours ?

Which I tried to solve by brute force and simulated annealing (what else?!), first defining a target

targ=function(tabz){
  sum(tabz[-c(1,9),-c(1,9)]-1.2*(tabz[-c(1,9),-c(1,9)]*tabz[-c(8,9),-c(1,9)]
      +tabz[-c(1,9),-c(1,9)]*tabz[-c(1,2),-c(1,9)]
      +tabz[-c(1,9),-c(1,9)]*tabz[-c(1,9),-c(8,9)]
      +tabz[-c(1,9),-c(1,9)]*tabz[-c(1,9),-c(1,2)]>2))}

on a 9×9 board where I penalise prohibited configuration by a factor 1.2 (a wee bit more than empty nodes). The perimeter of the 9×9 board is filled with ones and never actualised. (In the above convoluted products, the goal is to count how many neighbours of the entries equal to one are also equal to one. More than 2 is penalised.) The simulated annealing move is then updating the 9×9 grid gridz:

temp=1
maxarg=curarg=targ(gridz)
for (t in 1:1e3){
  for (v in 1:1e4){
    i=sample(2:8,1);j=sample(2:8,1)
    newgrid=gridz;newgrid[i,j]=1-gridz[i,j]
    newarg=targ(newgrid)
    if (log(runif(1))<temp*(newarg-curarg)){
      gridz=newgrid;curarg=newarg}}
temp=temp+.01}

and calls to the procedure always return 28 entries as the optimum, as in

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    1    0    1    0    1
[2,]    0    1    1    0    1    1    0
[3,]    1    1    0    1    0    1    1
[4,]    0    0    1    0    1    0    0
[5,]    1    1    0    1    0    1    1
[6,]    0    1    1    0    1    1    0
[7,]    1    0    1    0    1    0    1

As it happens, I had misread the wording of the original puzzle, which considered a dynamic placement of the units on the board, one at a time with two free neighbours imposed.