**F**or ‘Og’s readers interested in lecturer or professor positions in French universities next academic year, including a lecturer position at Paris-Dauphine in applied and computational statistics!, you need to apply for a qualification label by a national committee which strict deadline is next Tuesday, October 25, at 4pm (Paris/CET time). (The whole procedure is exclusively in French!)

## positions in French universities [deadline]

Posted in Kids, Statistics, University life with tags academic position, assistant professor position, deadline, France, French universities, lecturer, qualification, Université Paris Dauphine on October 21, 2016 by xi'an## Suffrage Science awards in maths and computing

Posted in pictures, Statistics, University life with tags Ada Lovelace, Bletchley Park, Cambridge University, Euler's formula, Great-Britain, Imperial College London, jewellery, MRC Unit, Suffrage Science awards, Suffragettes, University of Warwick, WPSU on October 21, 2016 by xi'an**O**n October 11, at Bletchley Park, the Suffrage Science awards in mathematics and computer sciences were awarded for the first time to 12 senior female researchers. Among whom three statisticians, Professor Christl Donnelly from Imperial College London, my colleague at Warwick, Jane Hutton, and my friend and co-author, Sylvia Richardson, from MRC, Cambridge University. This initiative was started by the Medical Research Council in 2011 by Suffrage Science awards for life sciences, followed in 2013 by one for engineering and physics, and this year for maths and computing. The name of the award aims to connect with the Suffragette movement of the late 19th and early 20th Centuries, which were particularly active in Britain. One peculiar aspect of this award is that the recipients are given pieces of jewellery, created for each field, pieces that they will themselves give two years later to a new recipient of their choice, and so on in an infinite regress! (Which suggests a related puzzle, namely to figure out how many years it should take until all female scientists have received the award. But since the number increases as the square of the number of years, this is not going to happen unless the field proves particularly hostile to women scientists!) This jewellery award also relates to the history of the Suffragette movement since the WPSU commissioned their own jewellery awards. A clever additional touch was that the awards were delivered on Ada Lovelace Day, October 11.

## Testarossa

Posted in Kids, Travel, Wines with tags Italian wines, Montepulciano d'Abruzzo, red wine, Testarossa on October 20, 2016 by xi'an## David Cox gets the first International Prize in Statistics

Posted in pictures, Statistics, University life with tags Cox Model, David Cox, IMS, International Prize in Statistics, Nobel Prize, Nuffield College, proportional hazards model, survival analysis, University of Oxford on October 20, 2016 by xi'an**J**ust received an email from the IMS that Sir David Cox (Nuffield College, Oxford) has been awarded the International Prize in Statistics. As discussed earlier on the ‘Og, this prize is intended to act as the equivalent of a Nobel prize for statistics. While I still have reservations about the concept. I have none whatsoever about the nomination as David would have been my suggestion from the start. Congratulations to him for the Prize and more significantly for his massive contributions to statistics, with foundational, methodological and societal impacts! [As Peter Diggle, President of the Royal Statistical Society just pointed out, it is quite fitting that it happens on European Statistics day!]

## a grim knight [cont’d]

Posted in Books, Kids, pictures, R, Statistics with tags chess, Donald Knuth, Ingmar Bergman, knight, Martin Gardner, self-avoiding random walk, The Riddler, The Seventh Seal on October 20, 2016 by xi'an**A**s discussed in the previous entry, there are two interpretations to this question from The Riddler:

“…how long is the longest path a knight can travel on a standard 8-by-8 board without letting the path intersect itself?”

as to what constitutes a path. As a (terrible) chess player, I would opt for the version on the previous post, the knight moving two steps in one direction and one in the other (or vice-versa), thus occupying three squares on the board. But one can consider instead the graph of the moves of that knight, as in the above picture and as in The Riddler. And since I could not let the problem go I also wrote an R code (as clumsy as the previous one!) to explore at random (with some minimal degree of annealing) the maximum length of a self-avoiding knight canter. The first maximal length I found this way is 32, although I also came by hand to a spiralling solution with a length of 33.

Running the R code longer over the weekend however led to a path of length 34, while the exact solution to the riddle is 35, as provided by the Riddler (and earlier in many forms, including Martin Gardner’s and Donald Knuth’s).

*[An unexpected side-effect of this riddle was ending up watching half of Bergman’s Seventh Seal in Swedish…]*

## an attempt at EP-ABC from scratch, nothing more… [except for a few bugs]

Posted in R, Statistics, University life with tags ABC, computer experiment, expectation-propagation, Gaussian approximation, implementation, Monte Carlo Statistical Methods, R on October 19, 2016 by xi'an**F**ollowing a request from one of the reviewers of our chapter Likelihood-free model choice, I tried to run EP-ABC on a toy problem and to compare it with the outcome of a random forest ABC. Literally starting from scratch, namely from the description found in Simon and Nicolas’ JASA paper. To run my test, I chose as my modelled data an exponential Exp(λ) sample of size 20, with a Gaussian N(0,1) prior on the log parameter (here replicated 100 times):

n=20 trueda=matrix(rexp(100*n),ncol=n) #prior values er0=0 Q0=1

Then I applied the formulas found in the paper for approximating the evidence, first defining the log normalising constant for an unnormalised Gaussian density as on the top of p.318

Psi=function(er,Q){ -.5*(log(Q/2/pi)-Q*er*er)}

[which exhibited a typo in the paper, as Simon Barthelmé figured out after emails exchanges that the right formulation was the inverse]

Psi=function(er,Q){ -.5*(log(Q/2/pi)-er*er/Q)}

and iterating the site updates as described in Sections 2.2, 3.1 and Algorithms 1 and 3:

ep=function(dat,M=1e4,eps,Nep=10){ n=length(dat) #global and ith natural parameters for Gaussian approximations #initialisation: Qi=rep(0,n) Q=Q0+sum(Qi) eri=rep(0,n) er=er0+sum(eri) #constants Ci in (2.6) normz=rep(1,n) for (t in 1:Nep){ for (i in sample(1:n)){ #site i update Qm=Q-Qi[i] #m for minus i erm=er-eri[i] #ABC sample thetas=rnorm(M,me=erm/Qm,sd=1/sqrt(Qm)) dati=rexp(M,rate=exp(thetas)) #update by formula (3.3) Mh=sum((abs(dati-dat[i])<eps)) mu=mean(thetas[abs(dati-dat[i])<eps]) sig=var(thetas[abs(dati-dat[i])<eps]) if (Mh>1e3){ #enough proposals accepted Qi[i]=1/sig-Qm eri[i]=mu/sig-erm Q=Qm+Qi[i] er=erm+eri[i] #update of Ci as in formula (2.7) #with correction bottom of p.319 normz[i]=log(Mh/M/2/eps)- Psi(er,Q)+Psi(erm,Qm) }}} #approximation of the log evidence as on p.318 return(sum(normz)+Psi(er,Q)-Psi(er0,Q0)) }

except that I made an R beginner’s mistake (!) when calling the normal simulation to be

thetas=rnorm(M,me=erm/Qm)/sqrt(Qm)

to be compared with the genuine evidence under a conjugate Exp(1) prior *[values of the evidence should differ but should not differ that much]*

ev=function(dat){ n=length(dat) lgamma(n+1)-(n+1)*log(1+sum(dat)) }

After correcting for those bugs and too small sample sizes, thanks to Simon kind-hearted help!, running this code results in minor discrepancies between both quantities:

evid=ovid=rep(0,100) M=1e4 #number of ABC samples propep=.1 #tolerance factor for (t in 1:100){ #tolerance scaled by data epsi=propep*sd(trueda[t,]) evid[t]=ep(trueda[t,],M,epsi) ovid[t]=ev(trueda[t,]) } plot(evid,ovid,cex=.6,pch=19)

as shown by the comparison below between the evidence and the EP-ABC approximations to the evidence, called *epabence* (the dashed line is the diagonal):

Obviously, this short (if lengthy at my scale) experiment is not intended to conclude that EP-ABC work or does not work. (Simon also provided additional comparison with the genuine evidence, that is under the right prior, and with the zero Monte-Carlo version of the EP-ABC evidence, achieving a high concentration over the diagonal.) I just conclude that the method does require a certain amount of calibration to become stable. Calibrations steps that are clearly spelled out in the paper (adaptive choice of M, stabilisation by quasi-random samples, stabilisation by stochastic optimisation steps and importance sampling), but also imply that EP-ABC is also “far from routine use because it make take days to tune on real problems” (pp.315-316). Thinking about the reasons for such discrepancy over the past days, one possible explanation I see for the impact of the calibration parameters is that the validation of EP if any occurs at a numerical rather than Monte Carlo level, hence that the Monte Carlo error must be really small to avoid interfering with the numerical aspects.

## scalable Langevin exact algorithm

Posted in Books, Statistics, Travel, University life with tags Brownian motion, control variate, importance sampling, JSM 2015, Langevin diffusion, normalising constant, Poisson process, quasi-stationary distribution, scalable MCMC, Seattle, sequential Monte Carlo, University of Warwick on October 18, 2016 by xi'an

“By employing a modification to existing naïve subsampling techniques we can obtain an algorithm which is still exact but has sub-linear iterative cost as a function of data size.”

**A** few weeks ago Murray Pollock, Paul Fearnhead, Adam Johansen and Gareth Roberts (all from Warwick except for Paul) arXived a paper The Scalable Langevin Exact Algorithm: Bayesian Inference for Big Data. (This was also the topic of Murray’s talk last year at JSM in Seattle.) One major advance found in the paper is the derivation of an “exact” algorithm that is sub-linear in the data size. As discussed in the introduction, the current approaches to large data problems either suffer from being approximate (like divide-and-conquer methods) or do not achieve significant reduction in the computing time, being of order O(n). The authors mention Teh and Welling (2011) sand their tochastic gradient approximation to the Langevin diffusion, when the gradient is based on a subsample. Without the Metropolis correction that would ensure an exact target but at a cost of order O(n). (Which makes the technique rather difficult to recommend.)

A novel [for me] notion at the core of this paper is the concept of *quasi-stationary distribution*, which is the limiting distribution of a Markov chain X[t] conditional on a Markov stopping time [being larger than t]. The approach is based on diffusions with appropriate stationary distributions like the Langevin diffusion. (Actually, as in most papers I have read and remember, the current paper only considers the Langevin diffusion.) In order to avoid the issues with unadjusted and Metropolis-adjusted Langevin schemes, a killed Brownian motion is created, which means a Brownian motion conditional of being alive till time T when the instantaneous killing rate is a given function of the chain, Φ(X[t]), related with the stationary measure of the Langevin diffusion ν. Under appropriate conditions, the density of this killed Brownian motion converges [in T] to √ν. Which immediately hints at creating a new Langevin diffusion targeting ν² instead of ν. And killing it with the proper rate, which can be done by thinning a Poisson process. Simulating the original path can be done by path-space rejection sampling, following the technique set by Gareth Roberts and co-authors more than ten years ago. Based on finite dimensional realisations of the path on [0,T]. And including the killing part can be done by importance sampling and checking that the simulated killing time is larger than the current (exponentially simulated) time.

One practical difficulty in the implementation of this neat principle is the derivation of the normalising constant, which evaluation degrades with the time horizon T. The solution adopted in the paper is through a sequential Monte Carlo method, using another discretisation of the time interval [0,T] (relying on the original one would get too costly?). As for subsampling, since the survival probability for the Brownian motion is based on an unbiased estimator, subsampling does not hurt if conducted in a random manner. Although this increases the variance on principle, the use of a control variate computed just once helps in reducing the complexity to O(1).

This is a tough paper and I have not gone through the effort of trying to implement it, but this is an original and innovative construct I would like to monitor in further details on a toy example, maybe next week while in Warwick. Or at least to discuss it with the authors.