**C**ongrats to Francis Bach, freshly nominated to the French Academy of Sciences, joining Stéphane Mallat²⁰¹⁴ and Éric Moulines²⁰¹⁷ as data science academicians!

## Archive for Académie des Sciences

## Francis Bach à l’Académie des Sciences

Posted in Statistics with tags Académie des Sciences, ENS, France, Francis Bach, INRIA, PSL on April 8, 2020 by xi'an## efficiency and the Fréchet-Darmois-Cramèr-Rao bound

Posted in Books, Kids, Statistics with tags Académie des Sciences, best unbiased estimator, Canada, Canadian Journal of Statistics, Cramer-Rao lower bound, cross validated, efficiency, Fréchet-Darmois-Cramèr-Rao bound, George Darmois, James-Stein estimator, mathematical statistics, Maurice Fréchet on February 4, 2019 by xi'an**F**ollowing some entries on X validated, and after grading a mathematical statistics exam involving Cramèr-Rao, or Fréchet-Darmois-Cramèr-Rao to include both French contributors pictured above, I wonder as usual at the relevance of a concept of *efficiency* outside [and even inside] the restricted case of unbiased estimators. The general (frequentist) version is that the variance of an estimator δ of [any transform of] θ with bias b(θ) is

I(θ)⁻¹ (1+b'(θ))²

while a Bayesian version is the van Trees inequality on the integrated squared error loss

(E(I(θ))+I(π))⁻¹

where I(θ) and I(π) are the Fisher information and the prior entropy, respectively. But this opens a whole can of worms, in my opinion since

- establishing that a given estimator is efficient requires computing both the bias and the variance of that estimator, not an easy task when considering a Bayes estimator or even the James-Stein estimator. I actually do not know if any of the estimators dominating the standard Normal mean estimator has been shown to be efficient (although there exist results for closed form expressions of the James-Stein estimator quadratic risk, including one of mine the Canadian Journal of Statistics published verbatim in 1988). Or is there a result that a Bayes estimator associated with the quadratic loss is by default efficient in either the first or second sense?
- while the initial Fréchet-Darmois-Cramèr-Rao bound is restricted to unbiased estimators (i.e., b(θ)≡0) and unable to produce efficient estimators in all settings but for the natural parameter in the setting of exponential families, moving to the general case means there exists one efficiency notion for every bias function b(θ), which makes the notion quite weak, while not necessarily producing efficient estimators anyway, the major impediment to taking this notion seriously;
- moving from the variance to the squared error loss is not more “natural” than using any [other] convex combination of variance and squared bias, creating a whole new class of optimalities (a grocery of cans of worms!);
- I never got into the van Trees inequality so cannot say much, except that the comparison between various priors is delicate since the integrated risks are against different parameter measures.

## machine learning à l’Académie, au Collège, et dans Le Monde

Posted in Books, Statistics, University life with tags Académie des Sciences, École Normale Supérieure, Collège de France, data science, Guillaume Budé, neural network, Paris, Stéphane Mallat, wavelets on January 5, 2018 by xi'an**A** back-cover story in Le Monde “Sciences & Médecine” of Stéphane Mallat, professor at École Normale and recently elected at the (French) Academy of Sciences and at the Collège de France, on a newly created Chair of Data Sciences. With works on wavelets, image compression, and neural networks, Stéphane Mallat will give his first lesson on Data Sciences at Collège de France, downtown Paris, on January 11. Entrance is free and open to everyone. (Collège de France is a unique institution, created by Guillaume Budé and supported by François Ier in 1530 to teach topics not taught (then) at the Sorbonne, as indicated by its motto *Docet Omnia,* including mathematics! Professors are nominated by the current faculty and the closest to statistics, prior to Stéphane Mallat, was Edmond Malinvaud.)

## Le Monde puzzle [#845]

Posted in Books, Kids, Statistics with tags Académie des Sciences, Escher, France Culture, James Joyce, Katrin, Le Monde, Lewis Carroll, mathematical puzzle, neurosciences, neutrino, quark on December 21, 2013 by xi'an**Y**et another one of those Le Monde mathematical puzzles which wording is confusing to me:

Take the set of integers between 1 and 1000. endow all of them randomly with red or blue tags. group them by subsets of three or more (grapes). and also group them by pairs so that a switch can change the colour of both integers. Is it always possible to activate the switches so that one ends up with all grapes being multicoloured? Unicoloured?

**I** find it (again!) ultimately puzzling since there are configurations where it cannot work. In the first case, take a grape made of four integers of the same colour, reunited two by two by a switch: activating the switch simply invert the colours but the grape remains uni-coloured. Conversely, take two integers with opposite colours within the same grape. No mater how long one operates the switch, they will remain of an opposite colour, won’t they?!

**T**his issue of Le Monde Science&Médecine leaflet actually had several interesting entries, from one on *“the thirst of the sociologist for statistical irregularities*“—meaning that regression should account for confounding factors like social class versus school performances—to the above picture about weighting the mass of a neutrino—mostly because it strongly reminds of Escher, as I cannot understand the 3D structure of the picture—, to another tribune of Marco Zito informing me that “quark” is a word invented by James Joyce—and not by Carroll as I believed—, to an interview of Stanislas Dehaene, a neuroscientist professor at Collège de France and a (fairly young) member of the Académie des Sciences—where he mentions statistical learning patterns that reminded me of the Bayesian constructs Pierre Bessière discussed on France Culture—.

## Le Monde puzzle [#838]

Posted in Books, Kids, R with tags Académie des Sciences, contingency table, Corcoran memorial medal, EADS, Le Monde, mathematical puzzle, Olivier Cappé, R, Robin Ryder, University of Oxford, voting paradox on November 2, 2013 by xi'an**A**nother one of those Le Monde mathematical puzzles which wording is confusing to me:

The 40 members of the Academy vote for two prizes.[Like the one recently attributed to my friend and coauthor Olivier Cappé!]Once the votes are counted for both prizes, it appears that the total votes for each of the candidates take all values between 0 and 12. Is it possible that two academicians never pick the same pair of candidates?

**I** find it puzzling… First because the total number of votes is then equal to 78, rather than 80=2 x 40. What happened to the vote of the “last” academician? Did she or he abstain? Or did two academicians abstain on candidates for only one prize each? Second, because of the incertitude in the original wording: can we assume with certainty that each integer between 0 and 12 is only taken once? If so, it would mean that the total number of candidates to the prizes is equal to 13. Third, the question seems unrelated with the “data”: since sums only are known, switching the votes of academicians Dupond and Dupont for candidates Durand and Martin in prize A (or in prize B) does not change the number of votes for Durand and Martin.

**I**f we assume that each integer between 0 and 12 *only appears once* in the collection of the sums of the votes and that one academician abstained on both prizes, the number of candidates for one of the prizes can vary between 4 and 9, with compatible solutions provided by this R line of code:

N=5 ok=TRUE while (ok){ prop=sample(0:12,N) los=(1:13)[-(prop+1)]-1 ok=((sum(prop)!=39)||(sum(los)!=39))}

which returns solutions like

> N=5 > prop [1] 9 11 7 12 > los [1] 0 1 2 3 4 5 6 8 10

but does not help in answering the question!

**N**ow, with Robin‘s help, (whose Corcoran memorial prize I should have mentioned in due time!), I reformulate the question as

The 40 members of the Academy vote for two prizes. Once the votes are counted for both prizes, it appears thatall values between 0 and 12 are found among the total votes for each of the candidates. Is it possible that two academicians never pick the same pair of candidates?

which has a nicer solution: since all academicians have voted there are two extra votes (40-38), meaning either twice 2 or thrice 1. So there are either 14 or 15 candidates *ex toto*. With at least 4 for a given prize. I then checked whether or not the above event could occur, using the following (pedestrian) R code:

for (t in 1:10^3){ #pick number of replicae R=sample(1:2,1); cand=13+R #pick number of literary candidates N=sample(4:(cand-4),1) #pick votes if (R==2){ votes=c(1,1,0:12) }else{ votes=c(2,0:12)} #correct number of votes ok=TRUE while (ok){ drop=sample(1:cand,N) los=sort(votes[-drop]) prop=sort(votes[drop]) ok=((sum(prop)!=40)||(sum(los)!=40)) } #individual votes for scientific candidates pool=NULL for (j in 1:N) pool=c(pool,rep(j,prop[j])) #individual votes for literary candidates cool=NULL for (j in 1:(cand-N)) cool=c(cool,rep(100+j,los[j])) cool=sample(cool) #random permutation #compare votes for (a in 1:39){ same=((a+1):40)[pool[(a+1):40]==pool[a]] if (length(same)>0){ stoq=max(cool[same]==cool[a]) if (stoq==1) break() } } if (stoq==0) break() }

which does not return a positive answer to the above question. (And does not require simulations from contingency tables with fixed margins!)