Archive for Enigma code machine

a Simpson paradox of sorts

Posted in Books, Kids, pictures, R with tags , , , , , , , , , on May 6, 2016 by xi'an

The riddle from The Riddler this week is about finding an undirected graph with N nodes and no isolated node such that the number of nodes with more connections than the average of their neighbours is maximal. A representation of a connected graph is through a matrix X of zeros and ones, on which one can spot the nodes satisfying the above condition as the positive entries of the vector (X1)^2-(X^21), if 1 denotes the vector of ones. I thus wrote an R code aiming at optimising this target

targe <- function(F){
  sum(F%*%F%*%rep(1,N)/(F%*%rep(1,N))^2<1)}

by mere simulated annealing:

rate <- function(N){ 
# generate matrix F
# 1. no single 
F=matrix(0,N,N) 
F[sample(2:N,1),1]=1 
F[1,]=F[,1] 
for (i in 2:(N-1)){ 
if (sum(F[,i])==0) 
F[sample((i+1):N,1),i]=1 
F[i,]=F[,i]} 
if (sum(F[,N])==0) 
F[sample(1:(N-1),1),N]=1 
F[N,]=F[,N] 
# 2. more connections 
F[lower.tri(F)]=F[lower.tri(F)]+
  sample(0:1,N*(N-1)/2,rep=TRUE,prob=c(N,1)) 
F[F>1]=1
F[upper.tri(F)]=t(F)[upper.tri(t(F))]
#simulated annealing
T=1e4
temp=N
targo=targe(F)
for (t in 1:T){
  #1. local proposal
  nod=sample(1:N,2)
  prop=F
  prop[nod[1],nod[2]]=prop[nod[2],nod[1]]=
     1-prop[nod[1],nod[2]]
  while (min(prop%*%rep(1,N))==0){
    nod=sample(1:N,2)
    prop=F
    prop[nod[1],nod[2]]=prop[nod[2],nod[1]]=
     1-prop[nod[1],nod[2]]}
  target=targe(prop)
  if (log(runif(1))*temp<target-targo){ 
    F=prop;targo=target} 
#2. global proposal 
  prop=F prop[lower.tri(prop)]=F[lower.tri(prop)]+
   sample(c(0,1),N*(N-1)/2,rep=TRUE,prob=c(N,1)) 
prop[prop>1]=1
  prop[upper.tri(prop)]=t(prop)[upper.tri(t(prop))]
  target=targe(prop)
  if (log(runif(1))*temp<target-targo){
      F=prop;targo=target}
   temp=temp*.999
   }
return(F)}

Eward SimpsonThis code returns quite consistently (modulo the simulated annealing uncertainty, which grows with N) the answer N-2 as the number of entries above average! Which is rather surprising in a Simpson-like manner since all entries but two are above average. (Incidentally, I found out that Edward Simpson recently wrote a paper in Significance about the Simpson-Yule paradox and him being a member of the Bletchley Park Enigma team. I must have missed out the connection with the Simpson paradox when reading the paper in the first place…)

Turing’s Bayesian contributions

Posted in Books, Kids, pictures, Running, Statistics, University life with tags , , , , , , , , , , , , on March 17, 2015 by xi'an

Following The Imitation Game, this recent movie about Alan Turing played by Benedict “Sherlock” Cumberbatch, been aired in French theatres, one of my colleagues in Dauphine asked me about the Bayesian contributions of Turing. I first tried to check in Sharon McGrayne‘s book, but realised it had vanished from my bookshelves, presumably lent to someone a while ago. (Please return it at your earliest convenience!) So I told him about the Bayesian principle of updating priors with data and prior probabilities with likelihood evidence in code detecting algorithms and ultimately machines at Bletchley Park… I could not got much farther than that and hence went checking on Internet for more fodder.

“Turing was one of the independent inventors of sequential analysis for which he naturally made use of the logarithm of the Bayes factor.” (p.393)

I came upon a few interesting entries but the most amazìng one was a 1979 note by I.J. Good (assistant of Turing during the War) published in Biometrika retracing the contributions of Alan Mathison Turing during the War. From those few pages, it emerges that Turing’s statistical ideas revolved around the Bayes factor that Turing used “without the qualification `Bayes’.” (p.393) He also introduced the notion of ban as a unit for the weight of evidence, in connection with the town of Banbury (UK) where specially formatted sheets of papers were printed “for carrying out an important classified process called Banburismus” (p.394). Which shows that even in 1979, Good did not dare to get into the details of Turing’s work during the War… And explains why he was testing simple statistical hypothesis against simple statistical hypothesis. Good also credits Turing for the expected weight of evidence, which is another name for the Kullback-Leibler divergence and for Shannon’s information, whom Turing would visit in the U.S. after the War. In the final sections of the note, Turing is also associated with Gini’s index, the estimation of the number of species (processed by Good from Turing’s suggestion in a 1953 Biometrika paper, that is, prior to Turing’s suicide. In fact, Good states in this paper that “a very large part of the credit for the present paper should be given to [Turing]”, p.237), and empirical Bayes.

Apology to Alan Turing

Posted in Running, Statistics with tags , , , on October 17, 2009 by xi'an

In the Eurostar back from London, I read a paper in Le Monde about Gordon Brown’s posthumous apology to Alan Turing for the way he was prosecuted by English courtsfor homosexuality in 1952 and eventually driven to suicide in 1954… I just find almost impossible to believe that, till 1967 in England (980 in Scotland and 1981 in France!), homosexuality was a crime, open to prosecution and, in the case of Turing, to chemical castration! This, in addition, ruined his running abilities which were close to Olympic level of those days (his best time in the marathon was 2:46:3). In conjunction with his breaking of the Enigma code machine during the war at Bletchley Park, Turing developped notions of Bayesian information theory like Banburismus and bans. To think of how much he could have contributed to computational and foundational Bayesian statistics, as well as artificial intelligence and mathematical biology, had not he  been persecuted by a predjudiced society… The Prime Minister’s apology came as the result of a petition campaign started by John Graham-Cumming.