Archive for simulated annealing

Le Monde [last] puzzle [#1026]

Posted in Books, Kids, R with tags , , , , , on November 2, 2017 by xi'an

The last and final Le Monde puzzle is a bit of a disappointment, to wit:

A 4×4 table is filled with positive and different integers. A 3×3 table is then deduced by adding four adjacent [i.e. sharing a common corner] entries of the original table. Similarly with a 2×2 table, summing up to a unique integer. What is the minimal value of this integer? And by how much does it increase if all 29 integers in the tables are different?

For the first question, the resulting integer writes down as the sum of the corner values, plus 3 times the sum of the side values, plus 9 times the sum of the 4 inner values [of the 4×4 table]. Hence, minimising the overall sum means taking the inner values as 1,2,3,4, the side values as 5,…,12, and the corner values as 13,…,16. Resulting in a total sum of 352. As checked in this computer code in APL by Jean-Louis:

This configuration does not produce 29 distinct values, but moving one value higher in one corner does: I experimented with different upper bounds on the numbers and 17 always provided with the smallest overall sum, 365.

firz=matrix(0,3,3)#second level
thirz=matrix(0,2,2)#third level
for (t in 1:1e8){
flor=matrix(sample(1:17,16),4,4)
for (i in 1:3) for (j in 1:3)
firz[i,j]=sum(flor[i:(i+1),j:(j+1)])
for (i in 1:2) for (j in 1:2)
thirz[i,j]=sum(firz[i:(i+1),j:(j+1)])
#last
if (length(unique(c(flor,firz,thirz)))==29)
solz=min(solz,sum(thirz))}

and a further simulated annealing attempt did not get me anywhere close to this solution.

computational methods for numerical analysis with R [book review]

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , on October 31, 2017 by xi'an

compulysis+R_coverThis is a book by James P. Howard, II, I received from CRC Press for review in CHANCE. (As usual, the customary warning applies: most of this blog post will appear later in my book review column in CHANCE.) It consists in a traditional introduction to numerical analysis with backup from R codes and packages. The early chapters are setting the scenery, from basics on R to notions of numerical errors, before moving to linear algebra, interpolation, optimisation, integration, differentiation, and ODEs. The book comes with a package cmna that reproduces algorithms and testing. While I do not find much originality in the book, given its adherence to simple resolutions of the above topics, I could nonetheless use it for an elementary course in our first year classes. With maybe the exception of the linear algebra chapter that I did not find very helpful.

“…you can have a solution fast, cheap, or correct, provided you only pick two.” (p.27)

The (minor) issue I have with the book and that a potential mathematically keen student could face as well is that there is little in the way of justifying a particular approach to a given numerical problem (as opposed to others) and in characterising the limitations and failures of the presented methods (although this happens from time to time as e.g. for gradient descent, p.191). [Seeping in my Gallic “mal-être”, I am prone to over-criticise methods during classing, to the (increased) despair of my students!, but I also feel that avoiding over-rosy presentations is a good way to avoid later disappointments or even disasters.] In the case of this book, finding [more] ways of detecting would-be disasters would have been nice.

An uninteresting and highly idiosyncratic side comment is that the author preferred the French style for long division to the American one, reminding me of my first exposure to the latter, a few months ago! Another comment from a statistician is that mentioning time series inter- or extra-polation without a statistical model sounds close to anathema! And makes extrapolation a weapon without a cause.

“…we know, a priori, exactly how long the [simulated annealing] process will take since it is a function of the temperature and the cooling rate.” (p.199)

Unsurprisingly, the section on Monte Carlo integration is disappointing for a statistician/probabilistic numericist like me,  as it fails to give a complete enough picture of the methodology. All simulations seem to proceed there from a large enough hypercube. And recommending the “fantastic” (p.171) R function integrate as a default is scary, given the ability of the selected integration bounds to misled its users. Similarly, I feel that the simulated annealing section is not providing enough of a cautionary tale about the highly sensitive impact of cooling rates and absolute temperatures. It is only through the raw output of the algorithm applied to the travelling salesman problem that the novice reader can perceive the impact of some of these factors. (The acceptance bound on the jump (6.9) is incidentally wrongly called a probability on p.199, since it can take values larger than one.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

splitting a field by annealing

Posted in Kids, pictures, R, Statistics with tags , , , , , , , , on October 18, 2017 by xi'an

A recent riddle [from The Riddle] that I pondered about during a [long!] drive to Luxembourg last weekend was about splitting a square field into three lots of identical surface for a minimal length of separating wire… While this led me to conclude that the best solution was a T like separation, I ran a simulated annealing R code on my train trip to AutransValence, seemingly in agreement with this conclusion.I discretised the square into n² units and explored configurations by switching two units with different colours, according to a simulated annealing pattern (although unable to impose connectivity on the three regions!):

partz=matrix(1,n,n)
partz[,1:(n/3)]=2;partz[((n/2)+1):n,((n/3)+1):n]=3
#counting adjacent units of same colour 
nood=hood=matrix(4,n,n)
for (v in 1:n2) hood[v]=bourz(v,partz)
minz=el=sum(4-hood)
for (t in 1:T){
  colz=sample(1:3,2) #picks colours
  a=sample((1:n2)[(partz==colz[1])&(hood<4)],1)
  b=sample((1:n2)[(partz==colz[2])&(hood<4)],1) 
  partt=partz;partt[b]=colz[1];partt[a]=colz[2] 
#collection of squares impacted by switch 
  nood=hood 
  voiz=unique(c(a,a-1,a+1,a+n,a-n,b-1,b,b+1,b+n,b-n)) 
  voiz=voiz[(voiz>0)&(voiz<n2)] 
  for (v in voiz) nood[v]=bourz(v,partt) 
  if (nood[a]*nood[b]>0){
    difz=sum(nood)-sum(hood)
    if (log(runif(1))<difz^3/(n^3)*(1+log(10*rep*t)^3)){
      el=el-difz;partz=partt;hood=nood     
      if (el<minz){ minz=el;cool=partz}
  }}}

(where bourz computes the number of neighbours), which produces completely random patterns at high temperatures (low t) and which returns to the T configuration (more or less):if not always, as shown below:Once the (a?) solution was posted on The Riddler, it appeared that one triangular (Y) version proved better than the T one [if not started from corners], with a gain of 3% and that a curved separation was even better with an extra gain less than 1% [solution that I find quite surprising as straight lines should improve upon curved ones…]

non-Bayesian decision riddle

Posted in Statistics with tags , , , on June 22, 2017 by xi'an

As a continuation of the Bayesian resolution of last week riddle, I looked at a numeric resolution of the four secretaries problem, while in the train back from Rouen (and trying to block the chatter of my neighbours, a nuisance I find myself more and more sensitive to!). The target function is defined as

gainz=function(b,c,T=1e4,type="raw"){
  x=matrix(runif(4*T),ncol=4)
  maz=t(apply(x,1,cummax))
  zam=t(apply(x[,4:1],1,cummax))
  if (type=="raw"){return(mean(
   ((x[,2]>b*x[,1])*x[,2]+
    (x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*x[,3]+
        (x[,3]<c*maz[,2])*x[,4]))/maz[,4]))} 
  if (type=="global"){return(mean( 
    ((x[,2]>b*x[,1])*(x[,2]==maz[,4])+
     (x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*(x[,3]==maz[,4])+
         (x[,3]<c*maz[,2])*(x[,4]==maz[,4])))))} 
  if (type=="remain"){return(mean( 
    ((x[,2]>b*x[,1])*(x[,2]==zam[,3])+
     (x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*(x[,3]==zam[,2])+
          (x[,3]<c*maz[,2])*(x[,4]==zam[,2])))))}}

where the data is generated from a U(0,1) distribution as the loss functions are made scaled free by deciding to always sacrifice the first draw, x¹. This function is to be optimised in (b,c) and hence I used a plain vanilla simulated annealing R code:

avemale=function(T=3e4,type){
  b=c=.5
  maxtar=targe=gainz(b,c,T=1e4,type)
  temp=0.1
  for (t in 1:T){
    bp=b+runif(1,-temp,temp)
    cp=c+runif(1,-temp,temp)
    parge=(bp>0)*(cp>0)*gainz(bp,cp,T=1e4,type)
    if (parge>maxtar){
      b=bs=bp;c=cs=cp;maxtar=targe=parge}else{
    if (runif(1)<exp((parge-targe)/temp)){
      b=bp;c=cp;targe=parge}}
    temp=.9999*temp}
  return(list(bs=bs,cs=cs,max=maxtar))}

with outcomes

  • b=1, c=.5, and optimum 0.8 for the raw type
  • b=c=1 and optimum 0.45 for the global type
  • b undefined, c=2/3 and optimum 0.75 for the remain type

gerrymandering detection by MCMC

Posted in Books, Statistics with tags , , , , , , , on June 16, 2017 by xi'an

In the latest issue of Nature I read (June 8), there is a rather long feature article on mathematical (and statistical) ways of measuring gerrymandering, that is the manipulation of the delimitations of a voting district toward improving the chances of a certain party. (The name comes from Elbridge Gerry (1812) and the salamander shape of the district he created.) The difficulty covered by the article is about detecting gerrymandering, which leads to the challenging and almost philosophical question of defining a “fair” partition of a region into voting districts, when those are not geographically induced. Since each partition does not break the principles of “one person, one vote” and of majority rule. Having a candidate or party win at the global level and loose at every local level seems to go against this majority rule, but with electoral systems like in the US, this frequently happens (with dire consequences in the latest elections). Just another illustration of Simpson’s paradox, essentially. And a damning drawback of multi-tiered electoral systems.

“In order to change the district boundaries, we use a Markov Chain Monte Carlo algorithm to produce about 24,000 random but reasonable redistrictings.”

In the arXiv paper that led to this Nature article (along with other studies), Bagiat et al. essentially construct a tail probability to assess how extreme the current district partition is against a theoretical distribution of such partitions. Finding that the actual redistrictings of 2012 and 2016 in North Carolina are “extremely atypical”.  (The generation of random partitions obeyed four rules, namely equal population, geographic compacity and connexity, proximity to county boundaries, and a majority of Afro-American voters in at least two districts, the latest being a requirement in North Carolina. A score function was built by linear combination of four corresponding scores, mostly χ² like, and turned into a density, simulated annealing style. The determination of the final temperature β=1 (p.18) [or equivalently of the weights (p.20)] remains unclear to me. As does the use of more than 10⁵ simulated annealing iterations to produce a single partition (p.18)…

From a broader perspective, agreeing on a method to produce random district allocations could be the way to go towards solving the judicial dilemma in setting new voting maps as what is currently under discussion in the US.

SMC on a sequence of increasing dimension targets

Posted in Statistics with tags , , , , , , , , , on February 15, 2017 by xi'an

mixdirRichard Everitt and co-authors have arXived a preliminary version of a paper entitled Sequential Bayesian inference for mixture models and the coalescent using sequential Monte Carlo samplers with transformations. The central notion is an SMC version of the Carlin & Chib (1995) completion in the comparison of models in different dimensions. Namely to create auxiliary variables for each model in such a way that the dimension of the completed models are all the same. (Reversible jump MCMC à la Peter Green (1995) can also be interpreted this way, even though only relevant bits of the completion are used in the transitions.) I find the paper and the topic most interesting if only because it relates to earlier papers of us on population Monte Carlo. It also brought to my awareness the paper by Karagiannis and Andrieu (2013) on annealed reversible jump MCMC that I had missed at the time it appeared. The current paper exploits this annealed expansion in the devising of the moves. (Sequential Monte Carlo on a sequence of models with increasing dimension has been studied in the past.)

The way the SMC is described in the paper, namely, reweight-subsample-move, does not strike me as the most efficient as I would try to instead move-reweight-subsample, using a relevant move that incorporate the new model and hence enhance the chances of not rejecting.

One central application of the paper is mixture models with an unknown number of components. The SMC approach applied to this problem means creating a new component at each iteration t and moving the existing particles after adding the parameters of the new component. Since using the prior for this new part is unlikely to be at all efficient, a split move as in Richardson and Green (1997) can be considered, which brings back the dreaded Jacobian of RJMCMC into the picture! Here comes an interesting caveat of the method, namely that the split move forces a choice of the split component of the mixture. However, this does not appear as a strong difficulty, solved in the paper by auxiliary [index] variables, but possibly better solved by a mixture representation of the proposal, as in our PMC [population Monte Carlo] papers. Which also develop a family of SMC algorithms, incidentally. We found there that using a mixture representation of the proposal achieves a provable variance reduction.

“This puts a requirement on TSMC that the single transition it makes must be successful.”

As pointed by the authors, the transformation SMC they develop faces the drawback that a given model is only explored once in the algorithm, when moving to the next model. On principle, there would be nothing wrong in including regret steps, retracing earlier models in the light of the current one, since each step is an importance sampling step valid on its own right. But SMC also offers a natural albeit potentially high-varianced approximation to the marginal likelihood, which is quite appealing when comparing with an MCMC outcome. However, it would have been nice to see a comparison with alternative estimates of the marginal in the case of mixtures of distributions. I also wonder at the comparative performances of a dual approach that would be sequential in the number of observations as well, as in Chopin (2004) or our first population Monte Carlo paper (Cappé et al., 2005), since subsamples lead to tempered versions of the target and hence facilitate moves between models, being associated with flatter likelihoods.

a knapsack riddle?

Posted in Books, pictures, R, Statistics, Travel with tags , , , , , , on February 13, 2017 by xi'an

gear

The [then current now past] riddle of the week is a sort of multiarmed bandits optimisation. Of sorts. Or rather a generalised knapsack problem. The question is about optimising the allocation of 100 undistinguishable units to 10 distinct boxes against a similarly endowed adversary, when the loss function is

L(x,y)=(x_1>y_1)-(x_1<y_1)+...+10((x_{10}>y_{10})-(x_{10}<y_{10}))

and the distribution q of the adversary is unknown. As usual (!), the phrasing of the riddle is somewhat ambiguous but I am under the impression that the game is played sequentially, hence that one can learn about the distribution of the adversary, at least when assuming this adversary keeps the same distribution q at all times. Continue reading