**O**n a whim, I bought this “travel book” in a nice bookstore at the end of Rue Mouffetard, Paris, when looking for a weekend travel guide! I was fairly intrigued about this road-trip on an antique Soviet side-car, following the route of Napoleon’s army when retreating from a burning Moscow. In fact, I have always been fascinated by the way Napoleon got mired into the Campaign of Russia, and not only because of Charles Joseph Minard‘s amazing graphical summary of the Campaign! Despite advices from scientists and diplomats, Napoleon did not want to pay any heed to the climate constraints. And he did not account either for the sacrificial tendencies of the Russian troops and irregulars. A predictable disaster of sorts… The book thus commemorates this retreat from Moscow by driving three side-cars to Paris in the heart of winter, in order to coincide with the arrival of the few survivors of the Grande Armée in Paris. The book however gets very quickly boring as the painfully slow and uncertain drive cannot equate the horror of the hundreds of thousands of deaths in Napoleon’s army fleeing back to Paris. Driving by sub-zero temperatures sharing roads with indifferent Ukrainian truck-drivers and repairing every now and then the side-cars that fell apart does not appeal for long, especially when It only takes the side-cars 13 days to reach Paris and there is no tension in the road trip as they could really stop any time. On top of that, the discourse about the charisma of Napoleon and the complaints about the reduction of our life scope from heroic time to pampered materialists is pretty annoying, and so is the permanent glorification of the Russian soul. This type of chat does not really rise above the level of a blog series, trust an expert!

## Archive for the Books Category

## Berezina [book review]

Posted in Books, Travel with tags Berezina, Grande Armée, Minard, Russia, side-car, Tufte on May 14, 2016 by xi'an## a Bernoulli factory of sorts?

Posted in Books, Kids, Statistics with tags Bernoulli distribution, Bernoulli factory, cross validated, Monte Carlo, simulation, Stack Echange on May 10, 2016 by xi'an**A** nice question was posted on X validated as to figure out a way to simulate a Bernoulli B(q) variate when using only a Bernoulli B(p) generator. With the additional question of handling the special case q=a/b, a rational probability. This is not exactly a Bernoulli factory problem in that q does not write as f(p), but still a neat challenge. My solution would have been similar to the one posted by William Huber, namely to simulate a sequence of B(p) or B(1-p) towards zooming on q until the simulation of the underlying uniforms U allows us to conclude at the position of U wrt q. For instance, if p>q and X~B(p) is equal to zero, the underlying uniform is more than p, hence more than q, leading to returning zero for the B(q) generation. Else, a second B(p) or B(1-p) generation means breaking the interval (0,p) into two parts, one of which allows for stopping the generation, and so on. The solution posted by William Huber contains an R code that could be easily improved by choosing for each interval between p and (1-p) towards the maximal probability of stopping. I still wonder at the ultimate optimal solution that would minimise the (average or median) number of calls to the Bernoulli(p) generator.

## auxiliary variable methods as ABC

Posted in Books, pictures, Statistics, University life with tags ABC, auxiliary variable, doubly intractable problems, Markov random field, Newcastle-upon-Tyne, unbiased estimation on May 9, 2016 by xi'an**D**ennis Prangle and Richard Everitt arXived a note today where they point out the identity between the auxiliary variable approach of Møller et al. (2006) [or rather its multiple or annealed version à la Murray] and [exact] ABC (as in our 2009 paper) in the case of Markov random fields. The connection between the two appears when using an importance sampling step in the ABC algorithm and running a Markov chain forward and backward the same number of steps as there are levels in the annealing scheme of MAV. Maybe more a curiosity than an indicator of a large phenomenon, since it is so rare that ABC can be use in its exact form.

## sent to Coventry!

Posted in Books, pictures, Travel with tags Coventry, England, English, English Civil War, sent to Coventry, University of Warwick on May 7, 2016 by xi'an**T**he other day, my wife came across the expression sent to Coventry and asked me what the reason was for this expression, which Wikitionary explains as

### Verb

**send to Coventry** (*third-person singular simple present* **sends to Coventry**, *present participle* **sending to Coventry**, *simple past and past participle* **sent to Coventry**)

- (transitive, idiomatic) To ostracise, or systematically ignore someone.
*The group decided to***send**the unpopular members**to Coventry**.

I had never heard this expression before, certainly not while in Coventry, so checked on Wikipedia to see whether or not it was related to the rather unappealing down-town postwar reconstruction. As it appears, the most likely connection is much more ancient as it relates to royalist troops being sent to Coventry, a parliamentarian town during the English Civil War,

## a Simpson paradox of sorts

Posted in Books, Kids, pictures, R with tags Bletchley Park, Edward Simpson, Enigma code machine, graph, mathematical puzzle, Significance, Simpson's paradox, simulated annealing, The Riddler, Yule on May 6, 2016 by xi'an**T**he riddle from The Riddler this week is about finding an undirected graph with N nodes and no isolated node such that the number of nodes with more connections than the average of their neighbours is maximal. A representation of a connected graph is through a matrix X of zeros and ones, on which one can spot the nodes satisfying the above condition as the positive entries of the vector (X**1**)^2-(X^2**1**), if **1** denotes the vector of ones. I thus wrote an R code aiming at optimising this target

targe <- function(F){ sum(F%*%F%*%rep(1,N)/(F%*%rep(1,N))^2<1)}

by mere simulated annealing:

rate <- function(N){ # generate matrix F # 1. no single F=matrix(0,N,N) F[sample(2:N,1),1]=1 F[1,]=F[,1] for (i in 2:(N-1)){ if (sum(F[,i])==0) F[sample((i+1):N,1),i]=1 F[i,]=F[,i]} if (sum(F[,N])==0) F[sample(1:(N-1),1),N]=1 F[N,]=F[,N] # 2. more connections F[lower.tri(F)]=F[lower.tri(F)]+ sample(0:1,N*(N-1)/2,rep=TRUE,prob=c(N,1)) F[F>1]=1 F[upper.tri(F)]=t(F)[upper.tri(t(F))] #simulated annealing T=1e4 temp=N targo=targe(F) for (t in 1:T){ #1. local proposal nod=sample(1:N,2) prop=F prop[nod[1],nod[2]]=prop[nod[2],nod[1]]= 1-prop[nod[1],nod[2]] while (min(prop%*%rep(1,N))==0){ nod=sample(1:N,2) prop=F prop[nod[1],nod[2]]=prop[nod[2],nod[1]]= 1-prop[nod[1],nod[2]]} target=targe(prop) if (log(runif(1))*temp<target-targo){ F=prop;targo=target} #2. global proposal prop=F prop[lower.tri(prop)]=F[lower.tri(prop)]+ sample(c(0,1),N*(N-1)/2,rep=TRUE,prob=c(N,1)) prop[prop>1]=1 prop[upper.tri(prop)]=t(prop)[upper.tri(t(prop))] target=targe(prop) if (log(runif(1))*temp<target-targo){ F=prop;targo=target} temp=temp*.999 } return(F)}

This code returns quite consistently (modulo the simulated annealing uncertainty, which grows with N) the answer N-2 as the number of entries above average! Which is rather surprising in a Simpson-like manner since all entries but two are above average. (Incidentally, I found out that Edward Simpson recently wrote a paper in Significance about the Simpson-Yule paradox and him being a member of the Bletchley Park Enigma team. I must have missed out the connection with the Simpson paradox when reading the paper in the first place…)

## ABC for repulsive point processes

Posted in Books, pictures, Statistics, University life with tags ABC, ABC model choice, Bayesian model comparison, determinantal point process, Gibbs point process, point processes, prior predictive, summary statistics on May 5, 2016 by xi'an**S**hinichiro Shirota and Alan Gelfand arXived a paper on the use of ABC for analysing some repulsive point processes, more exactly the Gibbs point processes, for which ABC requires a perfect sampler to operate, unless one is okay with stopping an MCMC chain before it converges, and determinantal point processes studied by Lavancier et al. (2015) [a paper I wanted to review and could not find time to!]. Detrimental point processes have an intensity function that is the determinant of a covariance kernel, hence repulsive. Simulation of a determinantal process itself is not straightforward and involves approximations. But the likelihood itself is unavailable and Lavancier et al. (2015) use approximate versions by fast Fourier transforms, which means MCMC is challenging even with those approximate steps.

“The main computational cost of our algorithm is simulation of x for each iteration of the ABC-MCMC.”

The authors propose here to use ABC instead. With an extra approximative step for simulating the determinantal process itself. Interestingly, the Gibbs point process allows for a sufficient statistic, the number of R-closed points, although I fail to see how the radius R is determined by the model, while the determinantal process does not. The summary statistics end up being a collection of frequencies within various spheres of different radii. However, these statistics are then processed by Fearnhead’s and Prangle’s proposal, namely to come up as an approximation of E[θ|y] as the natural summary. Obtained by regression over the original summaries. Another layer of complexity stems from using an ABC-MCMC approach. And including a Lasso step in the regression towards excluding less relevant radii. The paper also considers Bayesian model validation for such point processes, implementing prior predictive tests with a ranked probability score, rather than a Bayes factor.

As point processes have always been somewhat mysterious to me, I do not have any intuition about the strength of the distributional assumptions there and the relevance of picking a determinantal process against, say, a Strauss process. The model comparisons operated in the paper are not strongly supporting one repulsive model versus the others, with the authors concluding at the need for many points towards a discrimination between models. I also wonder at the possibility of including other summaries than Ripley’s K-functions, which somewhat imply a discretisation of the space, by concentric rings. Maybe using other point processes for deriving summary statistics as MLEs or Bayes estimators for those models would help. (Or maybe not.)