SAME but different

Posted in Statistics, University life with tags , , , , , , , , , on October 27, 2014 by xi'an

After several clones of our SAME algorithm appeared in the literature, it is rather fun to see another paper acknowledging the connection. SAME but different was arXived today by Zhao, Jiang and Canny. The point of this short paper is to show that the parallel implementation of SAME leads to efficient performances compared with existing standards. Since the duplicated latent variables are independent [given θ] they can be simulated in parallel. They further assume independence between the components of those latent variables. And finite support. As in document analysis. So they can sample the replicated latent variables all at once. Parallelism is thus used solely for the components of the latent variable(s). SAME is normally associated with an annealing schedule but the authors could not detect an improvement over a fixed and large number of replications. They reported gains comparable to state-of-the-art variational Bayes on two large datasets. Quite fun to see SAME getting a new life thanks to computer scientists!

Le Monde puzzle [#868]

Posted in Books, Kids, Statistics with tags , , , on June 1, 2014 by xi'an

Another permutation-based Le Monde mathematical puzzle:

Given the integers 1,…n, a “perfect” combination is a pair (i,j) of integers such that no other pair enjoys the same sum. For n=33, what is the maximum of perfect combinations one can build? And for n=214?

A rather straightforward problem, or so it seemed: take the pairs (2m,2m+1), their sums all differ, and we get the maximal possible number of sums, ⌊n/2⌋… However, I did not read the question properly (!) and the constraint is on the sum (i+j), namely

How many mutually exclusive pairs (i,j) can be found with different sums all bounded by n=33? n=2014?

In which case, the previous and obvious proposal works no longer… The dumb brute-force search

```T=10^6
sol=0
for (t in 1:T){
perm=sample(1:sample(seq(20,32,2),1))
sal=sum(unique(apply(matrix(perm,ncol=2),1,sum))<33)
if (sal>sol){
sol=sal;laperm=perm}
}
```

```> sol
[1] 12
> laperm
[1]  6  9  1 24 13 20  4  7 21 14 17  3 16 11 19 25 23 18 12 26 15  2  5 10 22
[26]  8
> unique(apply(matrix(laperm,ncol=2),1,sum))
[1] 17 28 26 47 31 32 30 22 23 19 27 25 24
```

which is close of the solution sol=13 proposed in Le Monde… It is obviously hopeless for a sum bounded by 2014. A light attempt at simulated annealing did not help either.

Le Monde puzzle [#853]

Posted in Books, Kids, Statistics with tags , , , on February 13, 2014 by xi'an

Yet another one of those Le Monde mathematical puzzles which wording is confusing (or at least annoying) to me:

A political party has 11 commissions, to which belong some of the 13 members of the central committee. A token is given to each member for each commission whom he or she belongs. Two different members cannot share more than one common commission. How many tokens at most? Same question if the president belongs to five commissions.

I just dislike the “story” around the combinatoric problem. Given 13 sets and 11 letters, how many letters can one allocate to each set so that each pair of sets share at most one letter? While waiting for my prosthesis this afternoon, I thought of a purely random search, using the following R code:

```n=11
m=13
best=0
for (t in 1:10^3){
token=matrix(-1,ncol=n,nrow=m);diag(token)=1
#fill at random: -1 unchecked, 1 allocated, 0 prohibited
while (sum(token<0)>0){
res=TRUE
while (res){
if (sum(token<0)==1){
dex=(1:(n*m))[token[(1:(n*m))]<0]}else{
dex=sample((1:(n*m))[token[(1:(n*m))]<0],1)}
i=dex%%m+m*(dex%%m==0)
j=trunc(dex/(1.01*m))+1
res=FALSE
for (k in (1:m)[token[,j]==1])
res=res||(max(token[i,-j]+token[k,-j])==2)
token[dex]=0
if (sum(token<0)==0) res=FALSE
}
token[dex]=1*(sum(token<0)>0)
for (k in (1:m)[-i])
if (max(token[i,-j]+token[k,-j])==2) token[k,j]=0
image(1:13,1:11,token)
}
if (sum(token[token==1])>best) bestoken=token
best=max(best,sum(token))
}
```

which led to the value of 43 after that many iterations. Longer runs on faster machines at Paris-Dauphine did not change the output but, as usual with brute force simulation, the true solution may be such an extreme that it is extremely unlikely to happen… I then tried a standard simulated annealing exploration, but could not find an higher value. Except once, leading to the value 44. Here is the corresponding allocation of letters (committees) to sets (individuals) for that solution.In this Feb. 5, 2014, issue of Le Monde Science&Médecine leaflet, a review of (my Warwick colleague) Ian Stewart’s 17 equations that changed the World, which must have been recently translated in French (with no criticism of the less compelling chapter on Black-Scholes, and a confusion of the “bell curve” with statistics). Yet another tribune of Marco Zito about the generalisation of Richard Feynman’s diagrams to a solid called the Amplituhedron. (Not making much sense as exposed!)

sudoku break

Posted in pictures, R, Statistics with tags , , , , on December 13, 2013 by xi'an

While in Warwick last week, one evening after having exhausted my laptop battery, I tried the following Sudoku (from Libération):

```>   printSudoku(readSudoku("libe.dk"))
+-------+-------+-------+
| 4   6 |   2   | 3   9 |
|   3   |       |   2   |
| 7   2 |       | 5   6 |
+-------+-------+-------+
|       | 9 4 5 |       |
| 5     | 7 6 2 |     1 |
|       | 3 1 8 |       |
+-------+-------+-------+
| 6   9 |       | 1   3 |
|   7   |       |   9   |
| 3   1 |   9   | 4   7 |
+-------+-------+-------+
```

and could not even start. As it happened, this was a setting with no deterministic move, i.e. all free/empty entries had multiple possible values. So after trying for a while and following trees to no obvious contradiction (!) I decided to give up and on the next day (with power) to call my “old” sudoku solver (built while at SAMSI), using simulated annealing and got the result after a few thousand iterations. The detail of the exploration is represented above, the two colours being code for two different moves on the Sudoku table. Leading to the solution

```  +-------+-------+-------+
| 4 8 6 | 5 2 1 | 3 7 9 |
| 1 3 5 | 6 7 9 | 8 2 4 |
| 7 9 2 | 8 3 4 | 5 1 6 |
+-------+-------+-------+
| 2 1 3 | 9 4 5 | 7 6 8 |
| 5 4 8 | 7 6 2 | 9 3 1 |
| 9 6 7 | 3 1 8 | 2 4 5 |
+-------+-------+-------+
| 6 2 9 | 4 8 7 | 1 5 3 |
| 8 7 4 | 1 5 3 | 6 9 2 |
| 3 5 1 | 2 9 6 | 4 8 7 |
+-------+-------+-------+
```

I then tried a variant with more proposals (hence more colours) at each iteration, which ended up being stuck at a penalty of 4 (instead of 0) in the final thousand iterations. Although this is a one occurrence experiment, I find it interesting that having move proposals may get the algorithm stuck faster in a local minimum. Nothing very deep there, of course..!

running MCMC for too long, and even longer…

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , on October 23, 2013 by xi'an

Following my earlier post about the young astronomer who feared he was running his MCMC for too long, here is an update from his visit to my office this morning.  This visit proved quite an instructive visit for both of us. (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is as earlier completely unrelated with the young astronomer!)

First, the reason why he thought MCMC was running too long was that the acceptance rate was plummeting down to zero, whatever the random walk scale. The reason for this behaviour is that he was actually running a standard simulated annealing algorithm, hence observing the stabilisation of the Markov chain in one of the (global) modes of the target function. In that sense, he was right that the MCMC was run for “too long”, as there was nothing to expect once the mode had been reached and the temperature turned down to zero. So the algorithm was working correctly.

Second, the astronomy problem he considers had a rather complex likelihood, for which he substituted a distance between the (discretised) observed data and (discretised) simulated data, simulated conditional on the current parameter value. Now…does this ring a bell? If not, here is a three letter clue: ABC… Indeed, the trick he had found to get around this likelihood calculation issue was to re-invent a version of ABC-MCMC! Except that the distance was re-introduced into a regular MCMC scheme as a substitute to the log-likelihood. And compared with the distance at the previous MCMC iteration. This is quite clever, even though this substitution suffers from a normalisation issue (that I already mentioned in the post about Holmes’ and Walker’s idea to turn loss functions into pseudo likelihoods. Regular ABC does not encounter this difficult, obviously. I am still bemused by this reinvention of ABC from scratch!

So we are now at a stage where my young friend will experiment with (hopefully) correct ABC steps, trying to derive the tolerance value from warmup simulations and use some of the accelerating tricks suggested by Umberto Picchini and Julie Forman to avoid simulating the characteristics of millions of stars for nothing. And we agreed to meet soon for an update. Indeed, a fairly profitable morning for both of us!

ABC for design

Posted in Statistics with tags , , , , , , , on August 30, 2013 by xi'an

I wrote a comment on this arXived paper on simulation based design that starts from Müller (1999) and gets an ABC perspective a while ago on my iPad when travelling to Montpellier and then forgot to download it…

Hainy, [Wener] Müller, and Wagner recently arXived a paper called “Likelihood-free Simulation-based Optimal Design“, paper which relies on ABC to construct optimal designs . Remember that [Peter] Müller (1999) uses a natural simulated annealing that is quite similar to our MAP [SAME] algorithm with Arnaud Doucet and Simon Godsill, relying on multiple versions of the data set to get to the maximum. The paper also builds upon our 2006 JASA paper with my then PhD student Billy Amzal, Eric Parent, and Frederic Bois, paper that took advantage of the then emerging particle methods to improve upon a static horizon target. While our method is sequential in that it pursues a moving target, it does not rely on the generic methodology developed by del Moral et al. (2006), where a backward kernel brings more stability to the moves. The paper also implements a version of our population Monte Carlo ABC algorithm (Beaumont et al., 2009), as a first step before an MCMC simulation. Overall, the paper sounds more like a review than like a strongly directive entry into ABC based design in that it remains quite generic. Not that I have specific suggestions, mind!, but I fear a realistic implementation (as opposed to the linear model used in the paper) would require a certain amount of calibration. There are missing references of recent papers using ABC for design, including some by Michael Stumpf I think.

I did not know about the Kuck et al. reference… Which is reproducing our 2006 approach within the del Moral framework. It uses a continuous temperature scale that I find artificial and not that useful, again a maybe superficial comment as I didn’t get very much into the paper … Just that integer powers lead to multiples of the sample and have a nice algorithmic counterpart.

optimisation via slice sampling

Posted in Statistics with tags , , , , on December 20, 2012 by xi'an

This morning, over breakfast; I read the paper recently arXived by John Birge et Nick Polson. I was intrigued by the combination of optimisation and of slice sampling, but got a wee disappointed by the paper, in that it proposes a simple form of simulated annealing, minimising f(x) by targeting a small collection of energy functions exp{-τf(x)}. Indeed, the slice sampler is used to explore each of those targets, i.e. for a fixed temperature τ. For the four functions considered in the paper, a slice sampler can indeed be implemented, but this feature could be seen as a marginalia, given that a random walk Metropolis-Hastings algorithm could be used as a proposal mechanism in other cases. The other intriguing fact is that annealing is not used in the traditional way of increasing the coefficient τ along iterations (as in our SAME algorithm), for which convergence issues are much more intricate, but instead stays at the level where a whole (Markov) sample is used for each temperature, the outcomes being then compared in terms of localisation (and maybe for starting at the next temperature value). I did not see any discussion about the selection of the successive temperatures, which usually is a delicate issue in realistic settings, nor of the stopping rule(s) used to decide the maximum has been reached.