Archive for simulated annealing

interdependent Gibbs samplers

Posted in Books, Statistics, University life with tags , , , , , , on April 27, 2018 by xi'an

Mark Kozdoba and Shie Mannor just arXived a paper on an approach to accelerate a Gibbs sampler. With title “interdependent Gibbs samplers“. In fact, it presents rather strong similarities with our SAME algorithm. More of the same, as Adam Johanssen (Warwick) entitled one of his papers! The paper indeed suggests multiplying replicas of latent variables (e.g., an hidden path for an HMM) in an artificial model. And as in our 2002 paper, with Arnaud Doucet and Simon Godsill, the focus here is on maximum likelihood estimation (of the genuine parameters, not of the latent variables). Along with argument that the resulting pseudo-posterior is akin to a posterior with a powered likelihood. And a link with the EM algorithm. And an HMM application.

“The generative model consist of simply sampling the parameters ,  and then sampling m independent copies of the paths”

If anything this proposal is less appealing than SAME because it aims directly at the powered likelihood, rather than utilising an annealed sequence of powers that allows for a primary exploration of the whole parameter space before entering the trapping vicinity of a mode. Which makes me fail to catch the argument from the authors that this improves Gibbs sampling, as a more acute mode has on the opposite the dangerous feature of preventing visits to other modes. Hence the relevance to resort to some form of annealing.

As already mused upon in earlier posts, I find it most amazing that this technique has been re-discovered so many times, both in statistics and in adjacent fields. The idea of powering the likelihood with independent copies of the latent variables is obviously natural (since a version pops up every other year, always under a different name), but earlier versions should eventually saturate the market!

X entropy for optimisation

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on March 29, 2018 by xi'an

At Gregynog, with mounds of snow still visible in the surrounding hills, not to be confused with the many sheep dotting the fields(!), Owen Jones gave a three hour lecture on simulation for optimisation, which is a less travelled path when compared with simulation for integration. His second lecture covered cross entropy for optimisation purposes. (I had forgotten that Reuven Rubinstein and Dirk Kroese had put forward this aspect of their technique in the very title of their book. As “A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning”.) The X entropy approaches pushes for simulations restricted to top values of the target function, iterating to find the best parameter in the parametric family used for the simulation. (Best to be understood in the Kullback sense.) Now, this is a wee bit like simulated annealing, where lots of artificial entities have to be calibrated in the algorithm, due to the original problem being unrelated to an specific stochastic framework. X entropy facilitates concentration on the highest values of the target, but requires a family of probability distributions that puts weight on the top region. This may be a damning issue in large dimensions. Owen illustrated the approach in the case of the travelling salesman problem, where the parameterised distribution is a Markov chain on the state space of city sequences. Further, if the optimal value of the target is unknown, avoiding getting stuck in a local optimum may be tricky. (Owen presented a proof of convergence for a temperature going to zero slowly enough that is equivalent to a sure exploration of the entire state space, in a discrete setting, which does not provide a reassurance in this respect, as the corresponding algorithm cannot be implemented.) This method falls into the range of methods that are doubly stochastic in that they rely on Monte Carlo approximations at each iteration of the exploration algorithm.

During a later talk, I tried to recycle one of my earlier R codes on simulated annealing for sudokus, but could not find a useful family of proposal distributions to reach the (unique) solution. Using a mere product of distributions on each of the free positions in the sudoku grid only led me to a penalty of 13 errors…

1    2    8    5    9    7    4    9    3
7    3    5    1    2    4    6    2    8
4    6    9    6    3    8    5    7    1
2    7    5    3    1    6    9    4    8
8    1    4    7    8    9    7    6    2
6    9    3    8    4    2    1    3    5
3    8    6    4    7    5    2    1    9
1    4    2    9    6    3    8    5    7
9    5    7    2    1    8    3    4    6

It is hard to consider a distribution on the space of permutations, 𝔖⁸¹.

Le Monde puzzle [#1037]

Posted in Books, Kids, R with tags , , , , , , , on January 24, 2018 by xi'an

lemondapariA purely geometric Le Monde mathematical puzzle this (or two independent ones, rather):

Find whether or not there are inscribed and circumscribed circles to a convex polygon with 2018 sides of lengths ranging 1,2,…,2018.

In the first (or rather second) case, the circle of radius R that is tangential to the polygon and going through all nodes (assuming such a circle exists) is such that a side L and its corresponding inner angle θ satisfy

L²=R²2(1-cos(θ))

leading to the following R code

R=3.2e5
step=1e2
anglz=sum(acos(1-(1:2018)^2/(2*R^2)))
while (abs(anglz-2*pi)>1e-4){
R=R-step+2*step*(anglz>2*pi)*(R>step)
anglz=sum(acos(1-(1:2018)^2/(2*R^2))) 
step=step/1.01}

and the result is

> R=324221
> sum(acos(1-(1:2018)^2/(2*R^2)))-2*pi
[1] 9.754153e-05

(which is very close to the solution of Le Monde when replacing sin(α) by α!). The authors of the quoted paper do not seem to consider the existence an issue.

In the second case, there is a theorem that states that if the equations

x¹+x²=L¹,…,x²⁰¹⁸+x¹=L²⁰¹⁸

have a solution on R⁺ then there exists a circle such that the polygon is tangential to this circle. Quite interestingly, if the number n of sides is even there are an infinitude of tangential polygons if any.  Now, and rather obviously, the matrix A associated with the above system of linear equations is singular with a kernel induced by the vector (1,-1,…,1,-1). Hence the collection of the sides must satisfy

L¹-L²…+L²⁰¹⁷-L²⁰¹⁸ =0

which puts a constraint on the sequence of sides, namely to divide them into two groups with equal sum, 2018×2019/4, which is not an integer. Hence, a conclusion of impossibility! [Thanks to my office neighbours François and Julien for discussing the puzzle with me.]

Le Monde puzzle [#1036]

Posted in Books, Kids, R with tags , , , , on January 4, 2018 by xi'an

lemondapariAn arithmetic Le Monde mathematical puzzle to conclude 2017:

Find (a¹,…,a¹³), a permutation of (1,…,13) such that

a¹/a²+a³=a²+a³/a³+a⁴+a⁵=b¹<1
a⁶/a⁶+a⁷=a⁶+a⁷/a⁷+a⁸+a⁹=a⁷+a⁸+a⁹/a⁵+a⁹+a¹⁰=b²<1
a¹¹+a¹²/a¹²+a¹³=a¹²+a¹³/a¹³+a¹⁰=b³<1

The question can be solved by brute force simulation, checking all possible permutations of (1,…,13). But 13! is 6.6 trillion, a wee bit too many cases. Despite the problem being made of only four constraints and hence the target function taking only five possible values, a simulated annealing algorithm returned a solution within a few calls:

(a¹,…,a¹³)=(6,1,11,3,10,8,4,9,5,12,7,2,13)
(b¹,b²,b³)=(1/2,2/3,3/5)

using the following code:

checka=function(a){ #target to maximise
 return(1*(a[1]/sum(a[2:3])==sum(a[2:3])/sum(a[3:5]))+
  1*(sum(a[6:7])/sum(a[7:9])==a[6]/sum(a[6:7]))+
  1*(sum(a[7:9])/(a[5]+sum(a[9:10]))==a[6]/sum(a[6:7]))+
  1*(sum(a[11:12])/sum(a[12:13])==sum(a[12:13])/
    (a[10]+a[13])))}
parm=sample(1:13)
cheka=checka(parm)
beta=1
for (t in 1:1e6){
  qarm=parm
  idx=sample(1:13,sample(2:12))
  qarm[idx]=sample(qarm[idx])
  chekb=checka(qarm)
  if (log(runif(1))<beta*(chekb-cheka)){
     cheka=chekb;parm=qarm}
  beta=beta*(1+log(1.00001))
  if (cheka==4) break()}

Le Monde [last] puzzle [#1026]

Posted in Books, Kids, R with tags , , , , , on November 2, 2017 by xi'an

The last and final Le Monde puzzle is a bit of a disappointment, to wit:

A 4×4 table is filled with positive and different integers. A 3×3 table is then deduced by adding four adjacent [i.e. sharing a common corner] entries of the original table. Similarly with a 2×2 table, summing up to a unique integer. What is the minimal value of this integer? And by how much does it increase if all 29 integers in the tables are different?

For the first question, the resulting integer writes down as the sum of the corner values, plus 3 times the sum of the side values, plus 9 times the sum of the 4 inner values [of the 4×4 table]. Hence, minimising the overall sum means taking the inner values as 1,2,3,4, the side values as 5,…,12, and the corner values as 13,…,16. Resulting in a total sum of 352. As checked in this computer code in APL by Jean-Louis:

This configuration does not produce 29 distinct values, but moving one value higher in one corner does: I experimented with different upper bounds on the numbers and 17 always provided with the smallest overall sum, 365.

firz=matrix(0,3,3)#second level
thirz=matrix(0,2,2)#third level
for (t in 1:1e8){
flor=matrix(sample(1:17,16),4,4)
for (i in 1:3) for (j in 1:3)
firz[i,j]=sum(flor[i:(i+1),j:(j+1)])
for (i in 1:2) for (j in 1:2)
thirz[i,j]=sum(firz[i:(i+1),j:(j+1)])
#last
if (length(unique(c(flor,firz,thirz)))==29)
solz=min(solz,sum(thirz))}

and a further simulated annealing attempt did not get me anywhere close to this solution.

computational methods for numerical analysis with R [book review]

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , on October 31, 2017 by xi'an

compulysis+R_coverThis is a book by James P. Howard, II, I received from CRC Press for review in CHANCE. (As usual, the customary warning applies: most of this blog post will appear later in my book review column in CHANCE.) It consists in a traditional introduction to numerical analysis with backup from R codes and packages. The early chapters are setting the scenery, from basics on R to notions of numerical errors, before moving to linear algebra, interpolation, optimisation, integration, differentiation, and ODEs. The book comes with a package cmna that reproduces algorithms and testing. While I do not find much originality in the book, given its adherence to simple resolutions of the above topics, I could nonetheless use it for an elementary course in our first year classes. With maybe the exception of the linear algebra chapter that I did not find very helpful.

“…you can have a solution fast, cheap, or correct, provided you only pick two.” (p.27)

The (minor) issue I have with the book and that a potential mathematically keen student could face as well is that there is little in the way of justifying a particular approach to a given numerical problem (as opposed to others) and in characterising the limitations and failures of the presented methods (although this happens from time to time as e.g. for gradient descent, p.191). [Seeping in my Gallic “mal-être”, I am prone to over-criticise methods during classing, to the (increased) despair of my students!, but I also feel that avoiding over-rosy presentations is a good way to avoid later disappointments or even disasters.] In the case of this book, finding [more] ways of detecting would-be disasters would have been nice.

An uninteresting and highly idiosyncratic side comment is that the author preferred the French style for long division to the American one, reminding me of my first exposure to the latter, a few months ago! Another comment from a statistician is that mentioning time series inter- or extra-polation without a statistical model sounds close to anathema! And makes extrapolation a weapon without a cause.

“…we know, a priori, exactly how long the [simulated annealing] process will take since it is a function of the temperature and the cooling rate.” (p.199)

Unsurprisingly, the section on Monte Carlo integration is disappointing for a statistician/probabilistic numericist like me,  as it fails to give a complete enough picture of the methodology. All simulations seem to proceed there from a large enough hypercube. And recommending the “fantastic” (p.171) R function integrate as a default is scary, given the ability of the selected integration bounds to misled its users. Similarly, I feel that the simulated annealing section is not providing enough of a cautionary tale about the highly sensitive impact of cooling rates and absolute temperatures. It is only through the raw output of the algorithm applied to the travelling salesman problem that the novice reader can perceive the impact of some of these factors. (The acceptance bound on the jump (6.9) is incidentally wrongly called a probability on p.199, since it can take values larger than one.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

splitting a field by annealing

Posted in Kids, pictures, R, Statistics with tags , , , , , , , , on October 18, 2017 by xi'an

A recent riddle [from The Riddle] that I pondered about during a [long!] drive to Luxembourg last weekend was about splitting a square field into three lots of identical surface for a minimal length of separating wire… While this led me to conclude that the best solution was a T like separation, I ran a simulated annealing R code on my train trip to AutransValence, seemingly in agreement with this conclusion.I discretised the square into n² units and explored configurations by switching two units with different colours, according to a simulated annealing pattern (although unable to impose connectivity on the three regions!):

partz=matrix(1,n,n)
partz[,1:(n/3)]=2;partz[((n/2)+1):n,((n/3)+1):n]=3
#counting adjacent units of same colour 
nood=hood=matrix(4,n,n)
for (v in 1:n2) hood[v]=bourz(v,partz)
minz=el=sum(4-hood)
for (t in 1:T){
  colz=sample(1:3,2) #picks colours
  a=sample((1:n2)[(partz==colz[1])&(hood<4)],1)
  b=sample((1:n2)[(partz==colz[2])&(hood<4)],1) 
  partt=partz;partt[b]=colz[1];partt[a]=colz[2] 
#collection of squares impacted by switch 
  nood=hood 
  voiz=unique(c(a,a-1,a+1,a+n,a-n,b-1,b,b+1,b+n,b-n)) 
  voiz=voiz[(voiz>0)&(voiz<n2)] 
  for (v in voiz) nood[v]=bourz(v,partt) 
  if (nood[a]*nood[b]>0){
    difz=sum(nood)-sum(hood)
    if (log(runif(1))<difz^3/(n^3)*(1+log(10*rep*t)^3)){
      el=el-difz;partz=partt;hood=nood     
      if (el<minz){ minz=el;cool=partz}
  }}}

(where bourz computes the number of neighbours), which produces completely random patterns at high temperatures (low t) and which returns to the T configuration (more or less):if not always, as shown below:Once the (a?) solution was posted on The Riddler, it appeared that one triangular (Y) version proved better than the T one [if not started from corners], with a gain of 3% and that a curved separation was even better with an extra gain less than 1% [solution that I find quite surprising as straight lines should improve upon curved ones…]