This week, thanks to a lack of clear instructions (from me) to my students in the Reading Classics student seminar, four students showed up with a presentation! Since I had planned for two teaching blocks, three of them managed to fit within the three hours, while the last one nicely accepted to wait till next week to present a paper by David Cox…

The first paper discussed therein was A new look at the statistical model identification, written in 1974 by Hirotugu Akaike. And presenting the AIC criterion. My student Rozan asked to give the presentation in French as he struggled with English, but it was still a challenge for him and he ended up being too close to the paper to provide a proper perspective on why AIC is written the way it is and why it is (potentially) relevant for model selection. And why it is not such a definitive answer to the model selection problem. This is not the simplest paper in the list, to be sure, but some intuition could have been built from the linear model, rather than producing the case of an ARMA(p,q) model without much explanation. (I actually wonder why the penalty for this model is (p+q)/T, rather than (p+q+1)/T for the additional variance parameter.) Or simulation ran on the performances of AIC versus other xIC’s…

The second paper was another classic, the original GLM paper by John Nelder and his coauthor Wedderburn, published in 1972 in Series B. A slightly easier paper, in that the notion of a generalised linear model is presented therein, with mathematical properties linking the (conditional) mean of the observation with the parameters and several examples that could be discussed. Plus having the book as a backup. My student Ysé did a reasonable job in presenting the concepts, but she would have benefited from this extra-week in including properly the computations she ran in R around the glm() function… (The definition of the deviance was somehow deficient, although this led to a small discussion during the class as to how the analysis of deviance was extending the then flourishing analysis of variance.) In the generic definition of the generalised linear models, I was also reminded of the
generality of the nuisance parameter modelling, which made the part of interest appear as an exponential shift on the original (nuisance) density.

The third paper, presented by Bong, was yet another classic, namely the FDR paper, Controlling the false discovery rate, of Benjamini and Hochberg in Series B (which was recently promoted to the should-have-been-a-Read-Paper category by the RSS Research Committee and discussed at the Annual RSS Conference in Edinburgh four years ago, as well as published in Series B). This 2010 discussion would actually have been a good start to discuss the paper in class, but Bong was not aware of it and mentioned earlier papers extending the 1995 classic. She gave a decent presentation of the problem and of the solution of Benjamini and Hochberg but I wonder how much of the novelty of the concept the class grasped. (I presume everyone was getting tired by then as I was the only one asking questions.) The slides somewhat made it look too much like a simulation experiment… (Unsurprisingly, the presentation did not include any Bayesian perspective on the approach, even though they are quite natural and emerged very quickly once the paper was published. I remember for instance the Valencia 7 meeting in Teneriffe where Larry Wasserman discussed about the Bayesian-frequentist agreement in multiple testing.)

Following a now well-established pattern, let me (re)warn (the few) unwary ‘Og readers that the links to Amazon.com and to Amazon.fr found on this blog are actually susceptible to earn me a monetary gain [from 4% to 8% on the sales] if a purchase is made by the reader in the 24 hours following the entry on Amazon through this link, thanks to the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com/fr. Unlike the pattern of last year, and of the year before last, the mostly purchased item through the links happens to be related to a blog post, since it is Andrew’s book, with 318 copies of its third edition sold through the ‘Og last month! Here are some of the most exotic purchases:

As usual the books I actually reviewed along the past months, positively or negatively, were among the top purchases… Like two dozen copies of The BUGS book. And a dozen of R for dummies. And even a few of The Cartoon Introduction to Statistics. (Despite a most critical review.) Thanks to all of you using those links (for feeding further my book addiction, books that now eventually end up in the math common room in Dauphine or Warwick, once I have read them)!

A Le Monde mathematical puzzle of moderate difficulty:

How many binary quintuplets (a,b,c,d,e) can be found such that any pair of quintuplets differs by at least two digits?

I solved it by the following R code that iteratively eliminates quintuplets that are not different enough from the first ones, for a random order of the 2⁵ quintuplets because the order matters in the resulting number (the intToBits trick was provided by an answer on StackExchange/stackoverflow):

```sol=0
for (t in 1:10^5){ #random permutations
as.integer(intToBits(x))})[1:5,sample(1:32)]
V=32;inin=rep(TRUE,V);J=1
while (J<V){
for (i in (J+1):V)
inin[i]=FALSE
J=J+1}
if (sol<V){
}
```

which returns solutions like

```> sol
[1] 16
> levote
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]  0    0    0    0    1    1    1    1    0     1     0
[2,]  0    1    0    1    0    1    0    1    0     1     1
[3,]  0    1    1    0    1    0    1    1    1     0     0
[4,]  0    1    1    1    0    0    0    0    0     1     0
[5,]  0    0    0    0    0    0    1    0    0     0     0
[,12] [,13] [,14] [,15] [,16]
[1,]    0    1     1     0     1
[2,]    0    1     1     0     1
[3,]    1    0     0     1     1
[4,]    0    0     1     1     0
[5,]    1    0     1     0     1
```

In the same Science leaflet, Marco Zito had yet another tribune worth bloggin’ about (or against), under the title “Voyage au bout du bruit” (with no apologies to Céline!), where he blathers about (background) noise ["bruit"] versus signal without ever mentioning statistics. I will not repeat the earlier feat of translating the tribune, but he also includes an interesting trivia: in the old TV sets of my childhood, the “snow” seen in the absence of transmission signal is due in part to CMB!

An easily phrased (and solved?) Le Monde mathematical puzzle that does not [really] require an R code:

The five triplets A,B,C,D,E are such that

$A_1=B_2+C_2+D_2+E_2\,,\ B_1=A_2+C_2+D_2+E_2\,,...$

and

$A_2=B_3+C_3+D_3+E_3\,,\ B_2=A_3+C_3+D_3+E_3\,,...$

Given that

$A_1=193\,,\ B_1=175\,, C_1=185\,, D_1=187\,, E_1<175$

find the five triplets.

Adding up both sets of equations shows everything solely depends upon E1… So running an R code that checks for all possible values of E1 is a brute-force solution. However, one must first find what to check. Given that the sums of the triplets are of the form (16s,4s,s), the possible choices for E1 are necessarily restricted to

```> S0=193+187+185+175
> ceiling(S0/16)
[1] 47
> floor((S0+175)/16)
[1] 57
> (47:57)*16-S0 #E1=S1-S0
[1]  12  28  44  60  76  92 108 124 140 156 172
```

The first two values correspond to a second sum S2 equal to 188 and 192, respectively, which is incompatible with A1 being 193. Furthermore, the corresponding values for E2 and E3 are then given by

```> S2==(49:57)*4
> E1=(49:57)*16-S0
> E2=S2-E1
> S3=S2/4
> S3-E2
[1] -103  -90  -77  -64  -51  -38  -25  -12    1
```

which excludes all values but E1=172. No brute-force in the end…