**O**n October 11, at Bletchley Park, the Suffrage Science awards in mathematics and computer sciences were awarded for the first time to 12 senior female researchers. Among whom three statisticians, Professor Christl Donnelly from Imperial College London, my colleague at Warwick, Jane Hutton, and my friend and co-author, Sylvia Richardson, from MRC, Cambridge University. This initiative was started by the Medical Research Council in 2011 by Suffrage Science awards for life sciences, followed in 2013 by one for engineering and physics, and this year for maths and computing. The name of the award aims to connect with the Suffragette movement of the late 19th and early 20th Centuries, which were particularly active in Britain. One peculiar aspect of this award is that the recipients are given pieces of jewellery, created for each field, pieces that they will themselves give two years later to a new recipient of their choice, and so on in an infinite regress! (Which suggests a related puzzle, namely to figure out how many years it should take until all female scientists have received the award. But since the number increases as the square of the number of years, this is not going to happen unless the field proves particularly hostile to women scientists!) This jewellery award also relates to the history of the Suffragette movement since the WPSU commissioned their own jewellery awards. A clever additional touch was that the awards were delivered on Ada Lovelace Day, October 11.

## Archive for Bletchley Park

## Suffrage Science awards in maths and computing

Posted in pictures, Statistics, University life with tags Ada Lovelace, Bletchley Park, Cambridge University, Euler's formula, Great-Britain, Imperial College London, jewellery, MRC Unit, Suffrage Science awards, Suffragettes, University of Warwick, WPSU on October 21, 2016 by xi'an## a Simpson paradox of sorts

Posted in Books, Kids, pictures, R with tags Bletchley Park, Edward Simpson, Enigma code machine, graph, mathematical puzzle, Significance, Simpson's paradox, simulated annealing, The Riddler, Yule on May 6, 2016 by xi'an**T**he riddle from The Riddler this week is about finding an undirected graph with N nodes and no isolated node such that the number of nodes with more connections than the average of their neighbours is maximal. A representation of a connected graph is through a matrix X of zeros and ones, on which one can spot the nodes satisfying the above condition as the positive entries of the vector (X**1**)^2-(X^2**1**), if **1** denotes the vector of ones. I thus wrote an R code aiming at optimising this target

targe <- function(F){ sum(F%*%F%*%rep(1,N)/(F%*%rep(1,N))^2<1)}

by mere simulated annealing:

rate <- function(N){ # generate matrix F # 1. no single F=matrix(0,N,N) F[sample(2:N,1),1]=1 F[1,]=F[,1] for (i in 2:(N-1)){ if (sum(F[,i])==0) F[sample((i+1):N,1),i]=1 F[i,]=F[,i]} if (sum(F[,N])==0) F[sample(1:(N-1),1),N]=1 F[N,]=F[,N] # 2. more connections F[lower.tri(F)]=F[lower.tri(F)]+ sample(0:1,N*(N-1)/2,rep=TRUE,prob=c(N,1)) F[F>1]=1 F[upper.tri(F)]=t(F)[upper.tri(t(F))] #simulated annealing T=1e4 temp=N targo=targe(F) for (t in 1:T){ #1. local proposal nod=sample(1:N,2) prop=F prop[nod[1],nod[2]]=prop[nod[2],nod[1]]= 1-prop[nod[1],nod[2]] while (min(prop%*%rep(1,N))==0){ nod=sample(1:N,2) prop=F prop[nod[1],nod[2]]=prop[nod[2],nod[1]]= 1-prop[nod[1],nod[2]]} target=targe(prop) if (log(runif(1))*temp<target-targo){ F=prop;targo=target} #2. global proposal prop=F prop[lower.tri(prop)]=F[lower.tri(prop)]+ sample(c(0,1),N*(N-1)/2,rep=TRUE,prob=c(N,1)) prop[prop>1]=1 prop[upper.tri(prop)]=t(prop)[upper.tri(t(prop))] target=targe(prop) if (log(runif(1))*temp<target-targo){ F=prop;targo=target} temp=temp*.999 } return(F)}

This code returns quite consistently (modulo the simulated annealing uncertainty, which grows with N) the answer N-2 as the number of entries above average! Which is rather surprising in a Simpson-like manner since all entries but two are above average. (Incidentally, I found out that Edward Simpson recently wrote a paper in Significance about the Simpson-Yule paradox and him being a member of the Bletchley Park Enigma team. I must have missed out the connection with the Simpson paradox when reading the paper in the first place…)

## Turing’s Bayesian contributions

Posted in Books, Kids, pictures, Running, Statistics, University life with tags Alan Turing, Banbury, Biometrika, Bletchley Park, Cryptonomicon, England, Enigma code machine, I.J. Good, Kullback-Leibler divergence, missing species problem, Shannonś information, statistical evidence, WW II on March 17, 2015 by xi'an**F**ollowing The Imitation Game, this recent movie about Alan Turing played by Benedict “Sherlock” Cumberbatch, been aired in French theatres, one of my colleagues in Dauphine asked me about the Bayesian contributions of Turing. I first tried to check in Sharon McGrayne‘s book, but realised it had vanished from my bookshelves, presumably lent to someone a while ago. *(Please return it at your earliest convenience!)* So I told him about the Bayesian principle of updating priors with data and prior probabilities with likelihood evidence in code detecting algorithms and ultimately machines at Bletchley Park… I could not got much farther than that and hence went checking on Internet for more fodder.

“Turing was one of the independent inventors of sequential analysis for which he naturally made use of the logarithm of the Bayes factor.” (p.393)

I came upon a few interesting entries but the most amazìng one was a 1979 note by I.J. Good (assistant of Turing during the War) published in *Biometrika* retracing the contributions of Alan Mathison Turing during the War. From those few pages, it emerges that Turing’s statistical ideas revolved around the Bayes factor that Turing used “without the qualification `Bayes’.” (p.393) He also introduced the notion of ban as a unit for the weight of evidence, in connection with the town of Banbury (UK) where specially formatted sheets of papers were printed “for carrying out an important classified process called Banburismus” (p.394). Which shows that even in 1979, Good did not dare to get into the details of Turing’s work during the War… And explains why he was testing simple statistical hypothesis against simple statistical hypothesis. Good also credits Turing for the expected weight of evidence, which is another name for the Kullback-Leibler divergence and for Shannon’s information, whom Turing would visit in the U.S. after the War. In the final sections of the note, Turing is also associated with Gini’s index, the estimation of the number of species (processed by Good from Turing’s suggestion in a 1953 Biometrika paper, that is, prior to Turing’s suicide. In fact, Good states in this paper that “a very large part of the credit for the present paper should be given to [Turing]”, p.237), and empirical Bayes.

## JSM 2010 [day 2]

Posted in Books, pictures, R, Running, Statistics, University life with tags 2001: A Space Odyssey, Alan Turing, Bletchley Park, I.J. Good, JSM 2010, Stanley Park mermaid, statprob, Vancouver on August 3, 2010 by xi'an**A**fter a very good early run in Stanley Park, I went to a morning session on new statistical challenges in genetics, but unfortunately could not keep focussed enough (due to a very short night, still being not tuned to Pacific time!) so I ended up chatting with Sid Chib at the Springer booth about the future of R and the drawback of it running too slowly… The second session of the morning I attended was the I.J. Good memorial session (although there were many alternative choices I could have made at the same time!) where Steve Fienberg, Jim Berger, Adrian Raftery and David Banks gave different perspectives on the life and influence of this leading figure. After his work in Bletchley Park along Alan Turing during the war, already using Bayes factors introduced a few years earlier by Harold Jeffreys, I.J. Good contributed very much to the Bayesian revival of the 50’s. (A fact not mentioned this morning is that he was a consultant for *2001: A Space Odyssey*!) The afternoon session on Bayesian processing of massive data systems was somehow compulsory since I was talking in this session! While the talks were interestingly diverse, there were however again very people in the room, making me feel the attendance was much lower than last year. As the day ended earlier to let free space to the presidential address, this eventually came as a less exciting day (but left me time for an early evening swim plus two mixers!)…