## marauders of the lost sciences

Posted in Books, Statistics, University life with tags , , , , , , on October 26, 2014 by xi'an

The editors of a new blog entitled Marauders of the Lost Sciences (Learn from the giants) sent me an email to signal the start of this blog with a short excerpt from a giant in maths or stats posted every day:

There is  a new blog I wanted to tell you
about which  excerpts one  interesting or
classic  paper  or  book  a day  from the
mathematical  sciences.  We plan on daily
posting across the  range of mathematical
fields and at any level, but about 20-30%
of the posts in queue are from statistics.

The goal is to entice people to read the great
works of old.

The first post today was from an old paper by
Fisher applying Group Theory to the design of
experiments.


Interesting concept, which will hopefully generate comments to put the quoted passage into context. Somewhat connected to my Reading Statistical Classics posts. Which incidentally if sadly will not take place this year since only two students registered. (I am unsure about the references behind the title of that blog, besides Spielberg’s Raiders of the Lost Ark and Norman’s Marauders of Gor… I just hope Statistics does not qualify as a lost science!)

## Rivers of London [book review]

Posted in Books, Kids, Travel with tags , , , , , , , , , , , on October 25, 2014 by xi'an

Yet another book I grabbed on impulse while in Birmingham last month. And which had been waiting for me on a shelf of my office in Warwick. Another buy I do not regret! Rivers of London is delightful, as much for taking place in all corners of London as for the story itself. Not mentioning the highly enjoyable writing style!

“I though you were a sceptic, said Lesley. I though you were scientific”

The first volume in this detective+magic series, Rivers of London, sets the universe of this mix of traditional Metropolitan Police work and of urban magic, the title being about the deities of the rivers of London, including a Mother and a Father Thames… I usually dislike any story mixing modern life and fantasy but this is a definitive exception! What I enjoy in this book setting is primarily the language used in the book that is so uniquely English (to the point of having the U.S. edition edited!, if the author’s blog is to be believed). And the fact that it is so much about London, its history and inhabitants. But mostly about London, as an entity on its own. Even though my experience of London is limited to a few boroughs, there are many passages where I can relate to the location and this obviously makes the story much more appealing. The style is witty, ironic and full of understatements, a true pleasure.

“The tube is a good place for this sort of conceptual breakthrough because, unless you’ve got something to read, there’s bugger all else to do.”

The story itself is rather fun, with at least three levels of plots and two types of magic. It centres around two freshly hired London constables, one of them discovering magical abilities and been drafted to the supernatural section of the Metropolitan Police. And making all the monologues in the book. The supernatural section is made of a single Inspector, plus a few side characters, but with enough fancy details to give it life. In particular, Isaac Newton is credited with having started the section, called The Folly. Which is also the name of Ben Aaronovitch’s webpage.

“There was a poster (…) that said: Keep Calm and Carry On’, which I thought was good advice.”

This quote is unvoluntarily funny in that it takes place in a cellar holding material from World War II. Except that the now invasive red and white poster was never distributed during the war… On the opposite it was pulped to save paper and the fact that a few copies survived is a sort of (minor) miracle. Hence a double anachronism in that it did not belong to a WWII room and that Peter Grant should have seen its modern avatars all over London.

“Have you ever been to London? Don’t worry, it’s basically  just like the country. Only with more people.”

The last part of the book is darker and feels less well-written, maybe simply because of the darker side and of the accumulation of events, while the central character gets rather too central and too much of an unexpected hero that saves the day. There is in particular a part where he seems to forget about his friend Lesley who is in deep trouble at the time and this does not seem to make much sense. But, except for this lapse (maybe due to my quick reading of the book over the week in Warwick), the flow and pace are great, with this constant undertone of satire and wit from the central character. I am definitely looking forward reading tomes 2 and 3 in the series (having already read tome 4 in Austria!, which was a mistake as there were spoilers about earlier volumes).

## Feller’s shoes and Rasmus’ socks [well, Karl's actually...]

Posted in Books, Kids, R, Statistics, University life with tags , , , , on October 24, 2014 by xi'an

Yesterday, Rasmus Bååth [of puppies' fame!] posted a very nice blog using ABC to derive the posterior distribution of the total number of socks in the laundry when only pulling out orphan socks and no pair at all in the first eleven draws. Maybe not the most pressing issue for Bayesian inference in the era of Big data but still a challenge of sorts!

Rasmus set a prior on the total number m of socks, a negative Binomial Neg(15,1/3) distribution, and another prior of the proportion of socks that come by pairs, a Beta B(15,2) distribution, then simulated pseudo-data by picking eleven socks at random, and at last applied ABC (in Rubin’s 1984 sense) by waiting for the observed event, i.e. only orphans and no pair [of socks]. Brilliant!

The overall simplicity of the problem set me wondering about an alternative solution using the likelihood. Cannot be that hard, can it?! After a few computations rejected by opposing them to experimental frequencies, I put the problem on hold until I was back home and with access to my Feller volume 1, one of the few [math] books I keep at home… As I was convinced one of the exercises in Chapter II would cover this case. After checking, I found a partial solution, namely Exercice 26:

A closet contains n pairs of shoes. If 2r shoes are chosen at random (with 2r<n), what is the probability that there will be (a) no complete pair, (b) exactly one complete pair, (c) exactly two complete pairs among them?

This is not exactly a solution, but rather a problem, however it leads to the value

$p_j=\binom{n}{j}2^{2r-2j}\binom{n-j}{2r-2j}\Big/\binom{2n}{2r}$

as the probability of obtaining j pairs among those 2r shoes. Which also works for an odd number t of shoes:

$p_j=2^{t-2j}\binom{n}{j}\binom{n-j}{t-2j}\Big/\binom{2n}{t}$

as I checked against my large simulations. So I solved Exercise 26 in Feller volume 1 (!), but not Rasmus’ problem, since there are those orphan socks on top of the pairs. If one draws 11 socks out of m socks made of f orphans and g pairs, with f+2g=m, the number k of socks from the orphan group is an hypergeometric H(11,m,f) rv and the probability to observe 11 orphan socks total (either from the orphan or from the paired groups) is thus the marginal over all possible values of k:

$\sum_{k=0}^{11} \dfrac{\binom{f}{k}\binom{2g}{11-k}}{\binom{m}{11}}\times\dfrac{2^{11-k}\binom{g}{11-k}}{\binom{2g}{11-k}}$

so it could be argued that we are facing a closed-form likelihood problem. Even though it presumably took me longer to achieve this formula than for Rasmus to run his exact ABC code!

## BibTool on the air

Posted in Books, Linux, Travel, University life with tags , , , , , , , , , , on October 23, 2014 by xi'an

Yesterday night, just before leaving for Coventry, I realised I had about 30 versions of my “mother of all .bib” bib file, spread over directories and with broken links with the original mother file… (I mean, I always create bib files in new directories by a hard link,

    ln ~/mother.bib

but they eventually and inexplicably end up with a life of their own!) So I decided a Spring clean-up was in order and installed BibTool on my Linux machine to gather all those versions into a new encompassing all-inclusive bib reference. I did not take advantage of the many possibilities of the program, written by Gerd Neugebauer, but it certainly solved my problem: once I realised I had to set the variates

check.double = on
check.double.delete = on
pass.comments = off

all I had to do was to call

bibtool -s -i ../*/*.bib -o mother.bib
bibtool -d -i mother.bib -o mother.bib
bibtool -s -i mother.bib -o mother.bib


to merge all bib file and then to get rid of the duplicated entries in mother.bib (the -d option commented out the duplicates and the second call with -s removed them). And to remove the duplicated definitions in the preamble of the file. This took me very little time in the RER train from Paris-Dauphine (where I taught this morning, having a hard time to make the students envision the empirical cdf as an average of Dirac masses!) to Roissy airport, in contrast with my pedestrian replacement of all stray siblings of the mother bib into new proper hard links, one by one. I am sure there is a bash command that could have done it in one line, but I spent instead my flight to Birmingham switching all existing bib files, one by one…

## delayed acceptance [alternative]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on October 22, 2014 by xi'an

In a comment on our Accelerating Metropolis-Hastings algorithms: Delayed acceptance with prefetching paper, Philip commented that he had experimented with an alternative splitting technique retaining the right stationary measure: the idea behind his alternative acceleration is again (a) to divide the target into bits and (b) run the acceptance step by parts, towards a major reduction in computing time. The difference with our approach is to represent the  overall acceptance probability

$\min_{k=0,..,d}\left\{\prod_{j=1}^k \rho_j(\eta,\theta),1\right\}$

and, even more surprisingly than in our case, this representation remains associated with the right (posterior) target!!! Provided the ordering of the terms is random with a symmetric distribution on the permutation. This property can be directly checked via the detailed balance condition.

In a toy example, I compared the acceptance rates (acrat) for our delayed solution (letabin.R), for this alternative (letamin.R), and for a non-delayed reference (letabaz.R), when considering more and more fractured decompositions of a Bernoulli likelihood.

> system.time(source("letabin.R"))
user system elapsed
225.918 0.444 227.200
> acrat
[1] 0.3195 0.2424 0.2154 0.1917 0.1305 0.0958
> system.time(source("letamin.R"))
user system elapsed
340.677 0.512 345.389
> acrat
[1] 0.4045 0.4138 0.4194 0.4003 0.3998 0.4145
> system.time(source("letabaz.R"))
user system elapsed
49.271 0.080 49.862
> acrat
[1] 0.6078 0.6068 0.6103 0.6086 0.6040 0.6158
`

A very interesting outcome since the acceptance rate does not change with the number of terms in the decomposition for the alternative delayed acceptance method… Even though it logically takes longer than our solution. However, the drawback is that detailed balance implies picking the order at random, hence loosing on the gain in computing the cheap terms first. If reversibility could be bypassed, then this alternative would definitely get very appealing!

## control functionals for Monte Carlo integration

Posted in Books, Statistics, University life with tags , , , , , on October 21, 2014 by xi'an

This new arXival by Chris Oates, Mark Girolami, and Nicolas Chopin (warning: they all are colleagues & friends of mine!, at least until they read those comments…) is a variation on control variates, but with a surprising twist namely that the inclusion of a control variate functional may produce a sub-root-n (i.e., faster than √n) convergence rate in the resulting estimator. Surprising as I did not know one could get to sub-root-n rates..! Now I had forgotten that Anne Philippe and I used the score in an earlier paper of ours, as a control variate for Riemann sum approximations, with faster convergence rates, but this is indeed a new twist, in particular because it produces an unbiased estimator.

The control variate writes

$\psi_\phi (x) = \nabla_x \cdot \phi(x) + \phi(x)\cdot \nabla \pi(x)$

where π is the target density and φ is a free function to be optimised. (Under the constraint that πφ is integrable. Then the expectation of ψφ is indeed zero.) The “explanation” for the sub-root-n behaviour is that ψφ is chosen as an L2 regression. When looking at the sub-root-n convergence proof, the explanation is more of a Rao-Blackwellisation type, assuming a first level convergent (or presistent) approximation to the integrand [of the above form ψφ can be found. The optimal φ is the solution of a differential equation that needs estimating and the paper concentrates on approximating strategies. This connects with Antonietta Mira’s zero variance control variates, but in a non-parametric manner, adopting a Gaussian process as the prior on the unknown φ. And this is where the huge innovation in the paper resides, I think, i.e. in assuming a Gaussian process prior on the control functional and in managing to preserve unbiasedness. As in many of its implementations, modelling by Gaussian processes offers nice features, like ψφ being itself a Gaussian process. Except that it cannot be shown to lead to presistency on a theoretical basis. Even though it appears to hold in the examples of the paper. Apart from this theoretical difficulty, the potential hardship with the method seems to be in the implementation, as there are several parameters and functionals to be calibrated, hence calling for cross-validation which may often be time-consuming. The gains are humongous, so the method should be adopted whenever the added cost in implementing it is reasonable, cost which evaluation is not clearly provided by the paper. In the toy Gaussian example where everything can be computed, I am surprised at the relatively poor performance of a Riemann sum approximation to the integral, wondering at the level of quadrature involved therein. The paper also interestingly connects with O’Hagan’s (1991) Bayes-Hermite [polynomials] quadrature and quasi-Monte Carlo [obviously!].

## Shravan Vasishth at Bayes in Paris this week

Posted in Books, Statistics, University life with tags , , , , , , , , on October 20, 2014 by xi'an

Taking advantage of his visit to Paris this month, Shravan Vasishth, from University of Postdam, Germany, will give a talk at 10.30am, next Friday, October 24, at ENSAE on:

Using Bayesian Linear Mixed Models in Psycholinguistics: Some open issues

With the arrival of the probabilistic programming language Stan (and JAGS), it has become relatively easy to fit fairly complex Bayesian linear mixed models. Until now, the main tool that was available in R was lme4. I will talk about how we have fit these models in recently published work (Husain et al 2014, Hofmeister and Vasishth 2014). We are trying to develop a standard approach for fitting these models so that graduate students with minimal training in statistics can fit such models using Stan.

I will discuss some open issues that arose in the course of fitting linear mixed models. In particular, one issue is: should one assume a full variance-covariance matrix for random effects even when there is not enough data to estimate all parameters? In lme4, one often gets convergence failure or degenerate variance-covariance matrices in such cases and so one has to back off to a simpler model. But in Stan it is possible to assume vague priors on each parameter, and fit a full variance-covariance matrix for random effects. The advantage of doing this is that we faithfully express in the model how the data were generated—if there is not enough data to estimate the parameters, the posterior distribution will be dominated by the prior, and if there is enough data, we should get reasonable estimates for each parameter. Currently we fit full variance-covariance matrices, but we have been criticized for doing this. The criticism is that one should not try to fit such models when there is not enough data to estimate parameters. This position is very reasonable when using lme4; but in the Bayesian setting it does not seem to matter.