Archive for simulated annealing

Le Monde puzzle [#1066]

Posted in Books, Kids, R with tags , , , , , , , on September 13, 2018 by xi'an

Recalling Le Monde mathematical puzzle  first competition problem

For the X table below, what are the minimal number of lights that are on (green) to reach the minimal and maximal possible numbers of entries (P) with an even (P as pair) number of neighbours with lights on? In the illustration below, there are 16 lights on (green) and 31 entries with an even number of on-neighbours.

As suggested last week, this was amenable to a R resolution by low-tech simulated annealing although the number of configurations was not that large when accounting for symmetries. The R code is a wee bit long for full reproduction here but it works on moving away from a random filling of this cross by 0’s and 1’s, toward minimising or maximising the number of P’s, this simulated annealing loop being inserted in another loop recording the minimal number of 1’s in both cases. A first round produced 1 and 44 for the minimal and maximal numbers of P’s, respectively, associated with at least 16 and 3 1’s, respectively, but a longer run exhibited 45 for 6 1’s crossing one of the diagonals of the X, made of two aligned diagonals of the outer 3×3 tables. (This was confirmed by both Amic and Robin in Dauphine!) The next [trigonometry] puzzle is on!

Le Monde puzzle [#1650]

Posted in Books, Kids, R with tags , , , , , , , , , on September 5, 2018 by xi'an

A penultimate Le Monde mathematical puzzle  before the new competition starts [again!]

For a game opposing 40 players over 12 questions, anyone answering correctly a question gets as reward the number of people who failed to answer. Alice is the single winner: what is her minimal score? In another round, Bob is the only lowest grade: what is his maximum score?

For each player, the score S is the sum δ¹s¹+…+δ⁸s⁸, where the first term is an indicator for a correct answer and the second term is the sum over all other players of their complementary indicator, which can be replaced with the sum over all players since δ¹(1-δ¹)=0. Leading to the vector of scores

worz <- function(ansz){
  scor=apply(1-ansz,2,sum)
  return(apply(t(ansz)*scor,2,sum))}

Now, running by brute-force a massive number of simulations confirmed my intuition that the minimal winning score is 39, the number of players minus one [achieved by Alice giving a single good answer and the others none at all], while the maximum loosing score appeared to be 34, for which I had much less of an intuition!  I would have rather guessed something in the vicinity of 80 (being half of the answers replied correctly by half of the players)… Indeed, while in SIngapore, I however ran in the wee hours a quick simulated annealing code from this solution and moved to 77.

And the 2018 version of Le Monde maths puzzle competition starts today!, for a total of eight double questions, starting with an optimisation problem where the adjacent X table is filled with zeros and ones, trying to optimise (max and min) the number of positive entries [out of 45] for which an even number of neighbours is equal to one. On the represented configuration, green stands for one (16 ones) and P for the positive entries (31 of them). This should be amenable to a R resolution [R solution], by, once again!, simulated annealing. Deadline for the reply on the competition website is next Tuesday, midnight [UTC+1]

Le Monde puzzle [#1061]

Posted in Books, Kids, R with tags , , , , , on July 20, 2018 by xi'an

lemondapariA griddy Le Monde mathematical puzzle:

  1. On a 4×5 regular grid, find how many nodes need to be turned on to see all 3×4 squares to have at least one active corner in case of one arbitrary node failing.
  2.  Repeat for a 7×9 grid.

The question is open to simulated annealing, as in the following R code:

n=3;m=4;np=n+1;mp=m+1

cvr=function(grue){
  grud=grue
  obj=(max(grue)==0)
  for (i in (1:length(grue))[grue==1]){
   grud[i]=0
   obj=max(obj,max((1-grud[-1,-1])*(1-grud[-np,-mp])*
       (1-grud[-np,-1])*(1-grud[-1,-mp])))
   grud[i]=1}
  obj=99*obj+sum(grue)
  return(obj)}

dumban=function(grid,T=1e3,temp=1,beta=.99){
   obj=bez=cvr(grid)
   sprk=grid
   for (t in 1:T){
     grue=grid
     if (max(grue)==1){ grue[sample(rep((1:length(grid))[grid==1],2),1)]=0
      }else{ grue[sample(1:(np*mp),np+mp)]=1}
     jbo=cvr(grue)
     if (bez>jbo){ bez=jbo;sprk=grue}
     if (log(runif(1))<(obj-jbo)/temp){
        grid=grue;obj=cvr(grid)}
     temp=temp*beta
     }
   return(list(top=bez,sol=sprk))}

leading to

>  dumban(grid,T=1e6,temp=100,beta=.9999)
$top
[1] 8
$sol
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    1    0    1    0
[2,]    0    1    0    1    0
[3,]    0    1    0    1    0
[4,]    0    1    0    1    0

which sounds like a potential winner.

interdependent Gibbs samplers

Posted in Books, Statistics, University life with tags , , , , , , on April 27, 2018 by xi'an

Mark Kozdoba and Shie Mannor just arXived a paper on an approach to accelerate a Gibbs sampler. With title “interdependent Gibbs samplers“. In fact, it presents rather strong similarities with our SAME algorithm. More of the same, as Adam Johanssen (Warwick) entitled one of his papers! The paper indeed suggests multiplying replicas of latent variables (e.g., an hidden path for an HMM) in an artificial model. And as in our 2002 paper, with Arnaud Doucet and Simon Godsill, the focus here is on maximum likelihood estimation (of the genuine parameters, not of the latent variables). Along with argument that the resulting pseudo-posterior is akin to a posterior with a powered likelihood. And a link with the EM algorithm. And an HMM application.

“The generative model consist of simply sampling the parameters ,  and then sampling m independent copies of the paths”

If anything this proposal is less appealing than SAME because it aims directly at the powered likelihood, rather than utilising an annealed sequence of powers that allows for a primary exploration of the whole parameter space before entering the trapping vicinity of a mode. Which makes me fail to catch the argument from the authors that this improves Gibbs sampling, as a more acute mode has on the opposite the dangerous feature of preventing visits to other modes. Hence the relevance to resort to some form of annealing.

As already mused upon in earlier posts, I find it most amazing that this technique has been re-discovered so many times, both in statistics and in adjacent fields. The idea of powering the likelihood with independent copies of the latent variables is obviously natural (since a version pops up every other year, always under a different name), but earlier versions should eventually saturate the market!

X entropy for optimisation

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on March 29, 2018 by xi'an

At Gregynog, with mounds of snow still visible in the surrounding hills, not to be confused with the many sheep dotting the fields(!), Owen Jones gave a three hour lecture on simulation for optimisation, which is a less travelled path when compared with simulation for integration. His second lecture covered cross entropy for optimisation purposes. (I had forgotten that Reuven Rubinstein and Dirk Kroese had put forward this aspect of their technique in the very title of their book. As “A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning”.) The X entropy approaches pushes for simulations restricted to top values of the target function, iterating to find the best parameter in the parametric family used for the simulation. (Best to be understood in the Kullback sense.) Now, this is a wee bit like simulated annealing, where lots of artificial entities have to be calibrated in the algorithm, due to the original problem being unrelated to an specific stochastic framework. X entropy facilitates concentration on the highest values of the target, but requires a family of probability distributions that puts weight on the top region. This may be a damning issue in large dimensions. Owen illustrated the approach in the case of the travelling salesman problem, where the parameterised distribution is a Markov chain on the state space of city sequences. Further, if the optimal value of the target is unknown, avoiding getting stuck in a local optimum may be tricky. (Owen presented a proof of convergence for a temperature going to zero slowly enough that is equivalent to a sure exploration of the entire state space, in a discrete setting, which does not provide a reassurance in this respect, as the corresponding algorithm cannot be implemented.) This method falls into the range of methods that are doubly stochastic in that they rely on Monte Carlo approximations at each iteration of the exploration algorithm.

During a later talk, I tried to recycle one of my earlier R codes on simulated annealing for sudokus, but could not find a useful family of proposal distributions to reach the (unique) solution. Using a mere product of distributions on each of the free positions in the sudoku grid only led me to a penalty of 13 errors…

1    2    8    5    9    7    4    9    3
7    3    5    1    2    4    6    2    8
4    6    9    6    3    8    5    7    1
2    7    5    3    1    6    9    4    8
8    1    4    7    8    9    7    6    2
6    9    3    8    4    2    1    3    5
3    8    6    4    7    5    2    1    9
1    4    2    9    6    3    8    5    7
9    5    7    2    1    8    3    4    6

It is hard to consider a distribution on the space of permutations, 𝔖⁸¹.

Le Monde puzzle [#1037]

Posted in Books, Kids, R with tags , , , , , , , on January 24, 2018 by xi'an

lemondapariA purely geometric Le Monde mathematical puzzle this (or two independent ones, rather):

Find whether or not there are inscribed and circumscribed circles to a convex polygon with 2018 sides of lengths ranging 1,2,…,2018.

In the first (or rather second) case, the circle of radius R that is tangential to the polygon and going through all nodes (assuming such a circle exists) is such that a side L and its corresponding inner angle θ satisfy

L²=R²2(1-cos(θ))

leading to the following R code

R=3.2e5
step=1e2
anglz=sum(acos(1-(1:2018)^2/(2*R^2)))
while (abs(anglz-2*pi)>1e-4){
R=R-step+2*step*(anglz>2*pi)*(R>step)
anglz=sum(acos(1-(1:2018)^2/(2*R^2))) 
step=step/1.01}

and the result is

> R=324221
> sum(acos(1-(1:2018)^2/(2*R^2)))-2*pi
[1] 9.754153e-05

(which is very close to the solution of Le Monde when replacing sin(α) by α!). The authors of the quoted paper do not seem to consider the existence an issue.

In the second case, there is a theorem that states that if the equations

x¹+x²=L¹,…,x²⁰¹⁸+x¹=L²⁰¹⁸

have a solution on R⁺ then there exists a circle such that the polygon is tangential to this circle. Quite interestingly, if the number n of sides is even there are an infinitude of tangential polygons if any.  Now, and rather obviously, the matrix A associated with the above system of linear equations is singular with a kernel induced by the vector (1,-1,…,1,-1). Hence the collection of the sides must satisfy

L¹-L²…+L²⁰¹⁷-L²⁰¹⁸ =0

which puts a constraint on the sequence of sides, namely to divide them into two groups with equal sum, 2018×2019/4, which is not an integer. Hence, a conclusion of impossibility! [Thanks to my office neighbours François and Julien for discussing the puzzle with me.]

Le Monde puzzle [#1036]

Posted in Books, Kids, R with tags , , , , on January 4, 2018 by xi'an

lemondapariAn arithmetic Le Monde mathematical puzzle to conclude 2017:

Find (a¹,…,a¹³), a permutation of (1,…,13) such that

a¹/a²+a³=a²+a³/a³+a⁴+a⁵=b¹<1
a⁶/a⁶+a⁷=a⁶+a⁷/a⁷+a⁸+a⁹=a⁷+a⁸+a⁹/a⁵+a⁹+a¹⁰=b²<1
a¹¹+a¹²/a¹²+a¹³=a¹²+a¹³/a¹³+a¹⁰=b³<1

The question can be solved by brute force simulation, checking all possible permutations of (1,…,13). But 13! is 6.6 trillion, a wee bit too many cases. Despite the problem being made of only four constraints and hence the target function taking only five possible values, a simulated annealing algorithm returned a solution within a few calls:

(a¹,…,a¹³)=(6,1,11,3,10,8,4,9,5,12,7,2,13)
(b¹,b²,b³)=(1/2,2/3,3/5)

using the following code:

checka=function(a){ #target to maximise
 return(1*(a[1]/sum(a[2:3])==sum(a[2:3])/sum(a[3:5]))+
  1*(sum(a[6:7])/sum(a[7:9])==a[6]/sum(a[6:7]))+
  1*(sum(a[7:9])/(a[5]+sum(a[9:10]))==a[6]/sum(a[6:7]))+
  1*(sum(a[11:12])/sum(a[12:13])==sum(a[12:13])/
    (a[10]+a[13])))}
parm=sample(1:13)
cheka=checka(parm)
beta=1
for (t in 1:1e6){
  qarm=parm
  idx=sample(1:13,sample(2:12))
  qarm[idx]=sample(qarm[idx])
  chekb=checka(qarm)
  if (log(runif(1))<beta*(chekb-cheka)){
     cheka=chekb;parm=qarm}
  beta=beta*(1+log(1.00001))
  if (cheka==4) break()}