Archive for simulated annealing

Le Monde puzzle [#868]

Posted in Books, Kids, Statistics with tags , , , on June 1, 2014 by xi'an

Another permutation-based Le Monde mathematical puzzle:

Given the integers 1,…n, a “perfect” combination is a pair (i,j) of integers such that no other pair enjoys the same sum. For n=33, what is the maximum of perfect combinations one can build? And for n=214? 

A rather straightforward problem, or so it seemed: take the pairs (2m,2m+1), their sums all differ, and we get the maximal possible number of sums, ⌊n/2⌋… However, I did not read the question properly (!) and the constraint is on the sum (i+j), namely

How many mutually exclusive pairs (i,j) can be found with different sums all bounded by n=33? n=2014?

In which case, the previous and obvious proposal works no longer… The dumb brute-force search

for (t in 1:T){
  if (sal>sol){

leads to a solution of

> sol
[1] 12
> laperm
 [1]  6  9  1 24 13 20  4  7 21 14 17  3 16 11 19 25 23 18 12 26 15  2  5 10 22
[26]  8
> unique(apply(matrix(laperm,ncol=2),1,sum))
[1] 17 28 26 47 31 32 30 22 23 19 27 25 24

which is close of the solution sol=13 proposed in Le Monde… It is obviously hopeless for a sum bounded by 2014. A light attempt at simulated annealing did not help either.

Le Monde puzzle [#853]

Posted in Books, Kids, Statistics with tags , , , on February 13, 2014 by xi'an

Yet another one of those Le Monde mathematical puzzles which wording is confusing (or at least annoying) to me:

A political party has 11 commissions, to which belong some of the 13 members of the central committee. A token is given to each member for each commission whom he or she belongs. Two different members cannot share more than one common commission. How many tokens at most? Same question if the president belongs to five commissions.

I just dislike the “story” around the combinatoric problem. Given 13 sets and 11 letters, how many letters can one allocate to each set so that each pair of sets share at most one letter? While waiting for my prosthesis this afternoon, I thought of a purely random search, using the following R code:

for (t in 1:10^3){
  #fill at random: -1 unchecked, 1 allocated, 0 prohibited
  while (sum(token<0)>0){
    while (res){
      if (sum(token<0)==1){
      for (k in (1:m)[token[,j]==1])
      if (sum(token<0)==0) res=FALSE
    for (k in (1:m)[-i])
      if (max(token[i,-j]+token[k,-j])==2) token[k,j]=0
  if (sum(token[token==1])>best) bestoken=token

which led to the value of 43 after that many iterations. Longer runs on faster machines at Paris-Dauphine did not change the output but, as usual with brute force simulation, the true solution may be such an extreme that it is extremely unlikely to happen… I then tried a standard simulated annealing exploration, but could not find an higher value. Except once, leading to the value 44. Here is the corresponding allocation of letters (committees) to sets (individuals) for that solution.lemonde853In this Feb. 5, 2014, issue of Le Monde Science&Médecine leaflet, a review of (my Warwick colleague) Ian Stewart’s 17 equations that changed the World, which must have been recently translated in French (with no criticism of the less compelling chapter on Black-Scholes, and a confusion of the “bell curve” with statistics). Yet another tribune of Marco Zito about the generalisation of Richard Feynman’s diagrams to a solid called the Amplituhedron. (Not making much sense as exposed!)

sudoku break

Posted in pictures, R, Statistics with tags , , , , on December 13, 2013 by xi'an

sudo291113While in Warwick last week, one evening after having exhausted my laptop battery, I tried the following Sudoku (from Libération):

>   printSudoku(readSudoku(""))
  | 4   6 |   2   | 3   9 |
  |   3   |       |   2   |
  | 7   2 |       | 5   6 |
  |       | 9 4 5 |       |
  | 5     | 7 6 2 |     1 |
  |       | 3 1 8 |       |
  | 6   9 |       | 1   3 |
  |   7   |       |   9   |
  | 3   1 |   9   | 4   7 |

and could not even start. As it happened, this was a setting with no deterministic move, i.e. all free/empty entries had multiple possible values. So after trying for a while and following trees to no obvious contradiction (!) I decided to give up and on the next day (with power) to call my “old” sudoku solver (built while at SAMSI), using simulated annealing and got the result after a few thousand iterations. The detail of the exploration is represented above, the two colours being code for two different moves on the Sudoku table. Leading to the solution

  | 4 8 6 | 5 2 1 | 3 7 9 |
  | 1 3 5 | 6 7 9 | 8 2 4 |
  | 7 9 2 | 8 3 4 | 5 1 6 |
  | 2 1 3 | 9 4 5 | 7 6 8 |
  | 5 4 8 | 7 6 2 | 9 3 1 |
  | 9 6 7 | 3 1 8 | 2 4 5 |
  | 6 2 9 | 4 8 7 | 1 5 3 |
  | 8 7 4 | 1 5 3 | 6 9 2 |
  | 3 5 1 | 2 9 6 | 4 8 7 |

I then tried a variant with more proposals (hence more colours) at each iteration, which ended up being stuck at a penalty of 4 (instead of 0) in the final thousand iterations. Although this is a one occurrence experiment, I find it interesting that having move proposals may get the algorithm stuck faster in a local minimum. Nothing very deep there, of course..!


running MCMC for too long, and even longer…

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , on October 23, 2013 by xi'an

Clifton observatory, Clifton, Sept. 24, 2012Following my earlier post about the young astronomer who feared he was running his MCMC for too long, here is an update from his visit to my office this morning.  This visit proved quite an instructive visit for both of us. (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is as earlier completely unrelated with the young astronomer!)

First, the reason why he thought MCMC was running too long was that the acceptance rate was plummeting down to zero, whatever the random walk scale. The reason for this behaviour is that he was actually running a standard simulated annealing algorithm, hence observing the stabilisation of the Markov chain in one of the (global) modes of the target function. In that sense, he was right that the MCMC was run for “too long”, as there was nothing to expect once the mode had been reached and the temperature turned down to zero. So the algorithm was working correctly.

Second, the astronomy problem he considers had a rather complex likelihood, for which he substituted a distance between the (discretised) observed data and (discretised) simulated data, simulated conditional on the current parameter value. Now…does this ring a bell? If not, here is a three letter clue: ABC… Indeed, the trick he had found to get around this likelihood calculation issue was to re-invent a version of ABC-MCMC! Except that the distance was re-introduced into a regular MCMC scheme as a substitute to the log-likelihood. And compared with the distance at the previous MCMC iteration. This is quite clever, even though this substitution suffers from a normalisation issue (that I already mentioned in the post about Holmes’ and Walker’s idea to turn loss functions into pseudo likelihoods. Regular ABC does not encounter this difficult, obviously. I am still bemused by this reinvention of ABC from scratch!  

So we are now at a stage where my young friend will experiment with (hopefully) correct ABC steps, trying to derive the tolerance value from warmup simulations and use some of the accelerating tricks suggested by Umberto Picchini and Julie Forman to avoid simulating the characteristics of millions of stars for nothing. And we agreed to meet soon for an update. Indeed, a fairly profitable morning for both of us!

ABC for design

Posted in Statistics with tags , , , , , , , on August 30, 2013 by xi'an

I wrote a comment on this arXived paper on simulation based design that starts from Müller (1999) and gets an ABC perspective a while ago on my iPad when travelling to Montpellier and then forgot to download it…

Hainy, [Wener] Müller, and Wagner recently arXived a paper called “Likelihood-free Simulation-based Optimal Design“, paper which relies on ABC to construct optimal designs . Remember that [Peter] Müller (1999) uses a natural simulated annealing that is quite similar to our MAP [SAME] algorithm with Arnaud Doucet and Simon Godsill, relying on multiple versions of the data set to get to the maximum. The paper also builds upon our 2006 JASA paper with my then PhD student Billy Amzal, Eric Parent, and Frederic Bois, paper that took advantage of the then emerging particle methods to improve upon a static horizon target. While our method is sequential in that it pursues a moving target, it does not rely on the generic methodology developed by del Moral et al. (2006), where a backward kernel brings more stability to the moves. The paper also implements a version of our population Monte Carlo ABC algorithm (Beaumont et al., 2009), as a first step before an MCMC simulation. Overall, the paper sounds more like a review than like a strongly directive entry into ABC based design in that it remains quite generic. Not that I have specific suggestions, mind!, but I fear a realistic implementation (as opposed to the linear model used in the paper) would require a certain amount of calibration. There are missing references of recent papers using ABC for design, including some by Michael Stumpf I think.

I did not know about the Kuck et al. reference… Which is reproducing our 2006 approach within the del Moral framework. It uses a continuous temperature scale that I find artificial and not that useful, again a maybe superficial comment as I didn’t get very much into the paper … Just that integer powers lead to multiples of the sample and have a nice algorithmic counterpart.

optimisation via slice sampling

Posted in Statistics with tags , , , , on December 20, 2012 by xi'an

simulated annealing path on a multimodal surface (c) Robert and Casella, 2007This morning, over breakfast; I read the paper recently arXived by John Birge et Nick Polson. I was intrigued by the combination of optimisation and of slice sampling, but got a wee disappointed by the paper, in that it proposes a simple form of simulated annealing, minimising f(x) by targeting a small collection of energy functions exp{-τf(x)}. Indeed, the slice sampler is used to explore each of those targets, i.e. for a fixed temperature τ. For the four functions considered in the paper, a slice sampler can indeed be implemented, but this feature could be seen as a marginalia, given that a random walk Metropolis-Hastings algorithm could be used as a proposal mechanism in other cases. The other intriguing fact is that annealing is not used in the traditional way of increasing the coefficient τ along iterations (as in our SAME algorithm), for which convergence issues are much more intricate, but instead stays at the level where a whole (Markov) sample is used for each temperature, the outcomes being then compared in terms of localisation (and maybe for starting at the next temperature value). I did not see any discussion about the selection of the successive temperatures, which usually is a delicate issue in realistic settings, nor of the stopping rule(s) used to decide the maximum has been reached.

threshold schedules for ABC-SMC

Posted in Statistics, University life with tags , , , , , , , , , on October 17, 2012 by xi'an

Daniel Silk, Sarah Filippi and Michael Stumpf just posted on arXiv a new paper on ABC-SMC. They attack therein a typical problem with SMC (and PMC, and even MCMC!) methods, namely the ability to miss (global) modes of the target due to a poor initial exploration. So, if a proposal is built on previous simulations and if  those simulations have missed an important mode, it is quite likely that this proposal will concentrate on other parts of the space and miss the important mode even more. This is also why simulated annealing and other stochastic optimisation algorithms are so fickle: a wrong parameterisation (like the temperature schedule) induces the sequence to converge to a local optimum rather than to the global optimum. Since sequential ABC is a form of simulated annealing, the decreasing tolerance (or threshold) playing the part of the temperature, it is no surprise that it suffers from this generic disease…

The method proposed in the paper aims at avoiding this difficulty by looking at sudden drops in the acceptance rate curve (as a function of the tolerance ε),

\aleph_t(\epsilon)=\int p_t(x)\mathbb{I}(\Delta(x,x^\star)\le \epsilon)\,\text{d}x,

suggesting for threshold the value of ε that maximises the second derivative of this curve. Now, before getting to (at?) the issue of turning this principle into a practical algorithm, let me linger at the motivation for it:

“To see this, one can imagine an ε-ball expanding about the true data; at first the ball only encompasses a small number of particles that were drawn from very close to the global maximum, corresponding to the low gradient at the foot of the shape. Once ε is large enough we are able to accept the relatively large number of particles sitting in the local maximum, which causes the increase in gradient.” (p.5)

Thus, the argument for looking at values of ε preceding fast increases in the acceptance rate ℵ is that we are thus avoiding flatter and lower regions of the posterior support corresponding to a local maximum. It clearly encompasses the case studied by the authors of a global highly concentrated global mode, surrounded by flatter lower modes, but it seems to me that this is not necessarily the only possible reason for changes in the shape of the acceptance probability ℵ. First, we are looking at an ABC acceptance rate, not at a genuine likelihood. Second, this acceptance rate function depends on the current (and necessarily rough) approximate to the genuine predictive, pt. Furthermore, when taking into account this rudimentary replacement of the true likelihood function, it is rather difficult to envision how it impacts the correspondence between a proximity in the data and a proximity in the parameter space. (The toy example is both illuminating and misleading, because it considers a setting where the data is a deterministic transform of the parameter.) I thus think the analysis should refer more strongly to the non-parametric literature and in particular to the k-nearest-neighbour approach recently reformulated by Biau et al.: there is no reason to push the tolerance ε all the way down to zero as this limit does not correspond to the optimal value of the tolerance. If one does not use a non-parametric determination of the “right” tolerance, the lingering question is when and why stopping the sequence of ABC simulations.

The acceptance rate function ℵ is approximated using an unscented transform. I understand neither the implementation of, nor the motivations for, this choice. Given that the function ℵ is a one-dimensional transform of the tolerance and that it actually corresponds to the cdf of the distance between the true data and the (current) pseudo-data, why couldn’t we use smooth versions of a cdf estimate based on the simulated distances, given that these distances are already computed..?  I must be missing something.


Get every new post delivered to your Inbox.

Join 634 other followers