**I** learned last weekend that Jean-Paul Benzécri had died earlier in the week. He was a leading and charismatic figure of the French renewal in data analysis (or *analyse des données*) that used mostly algebraic tools to analyse large datasets, while staying as far as possible from the strong abstraction of French statistics at that time. While I did not know him on a personal basis, I remember from my lecturer years there that he used to come to Institut de Statistique de l’Université de Paris (ISUP), Université Pierre et Marie Curie, once a week and meet with a large group of younger statisticians, students and junior faculty, and then talk to them for long hours while walking back and forth along the corridor in Jussieu. Showing extreme dedication from the group as this windowless corridor was particularly ghastly! (I also remember less fondly hours spent over piles and piles of SAS printout trying to make sense of multiple graphs of projections produced by these algebraic methods and feeling there were too many degrees of freedom for them to feel rigorous enough.)

## Archive for obituary

## Jean-Paul Benzécri (1932-2019)

Posted in Books, pictures, Statistics, University life with tags analyse des correspondances, analyse des données, French statistics, ISUP, Jean-Paul Benzécri, Jussieu, obituary, Paris 6, SAS, Université de Paris, Université Pierre et Marie Curie on December 3, 2019 by xi'an## Gene Wolfe (1931-2019)

Posted in Statistics with tags Book of the New Sun, book review, Gene Wolfe, junk food, obituary, Pringle's, science fiction, The Guardian on May 19, 2019 by xi'an**J**ust found out that the writer Gene Wolfe, author of the unique New Sun series (and many other masterpieces) had passed away two weeks ago. (The Guardian has a detailed obituary covering his life and oeuvres. Where I learned that he developed the Pringle’s machine for Procter and Gamble, something he can be pardoned for his other achievements!) The style of the New Sun series is indeed unique, complex, carefully designed, crafted in a very refined and beautiful language (missing the translation of the more appropriate langue), and requires commitment from the reader as the story never completely unfolds and sets all details straight, with characters rarely if ever to be taken at face value, making me feel the urge to re-read the book once I was finishing its last page. Which I never did, actually, and should consider, indeed!

## the beauty of maths in computer science [book review]

Posted in Books, Statistics, University life with tags AIQ, AlphaGo, birthday problem, book review, communist party, computer science, cryptography, Czechoslovakia, error correcting codes, Fred Jelinek, Google, hidden Markov models, James Hellis, John von Neumann, Markov chains, Mersenne twister, obituary, PageRank, Viterbi's algorithm, vulgarisation, word segmentation on January 17, 2019 by xi'an**CRC** Press sent me this book for review in CHANCE: Written by Jun Wu, “staff research scientist in Google who invented Google’s Chinese, Japanese, and Korean Web search algorithms”, and translated from the Chinese, 数学之美, originating from Google blog entries. (Meaning most references are pre-2010.) A large part of the book is about word processing and web navigation, which is the author’s research specialty. And not so much about mathematics. (When rereading the first chapters to start this review I then realised why the part about language processing in AIQ sounded familiar: I had read it in the Beauty of Mathematics in Computer Science.)

In the first chapter, about the history of languages, I found out, among other things, that ancient Jewish copists of the Bible had an error correcting algorithm consisting in giving each character a numerical equivalent, summing up each row, then all rows, and checking the sum at the end of the page was the original one. The second chapter explains why the early attempts at language computer processing, based on grammar rules, were unsuccessful and how a statistical approach had broken the blockade. Explained via Markov chains in the following chapter. Along with the Good-Turing [Bayesian] estimate of the transition probabilities. Next comes a short and low-tech chapter on word segmentation. And then an introduction to hidden Markov models. Mentioning the Baum-Welch algorithm as a special case of EM, which makes a return by Chapter 26. Plus a chapter on entropies and Kullback-Leibler divergence.

A first intermede is provided by a chapter dedicated to the late Frederick Jelinek, the author’s mentor (including what I find a rather unfortunate equivalent drawn between the Nazi and Communist eras in Czechoslovakia, p.64). Chapter that sounds a wee bit too much like an extended obituary.

The next section of chapters is about search engines, with a few pages on Boolean logic, dynamic programming, graph theory, Google’s PageRank and TF-IDF (term frequency/inverse document frequency). Unsurprisingly, given that the entries were originally written for Google’s blog, Google’s tools and concepts keep popping throughout the entire book.

Another intermede about Amit Singhal, the designer of Google’s internal search ranking system, Ascorer. With another unfortunate equivalent with the AK-47 Kalashnikov rifle as “elegantly simple”, “effective, reliable, uncomplicated, and easy to implement or operate” (p.105). Even though I do get the (reason for the) analogy, using an equivalent tool which purpose is not to kill other people would have been just decent…

Then chapters on measuring proximity between news articles by (vectors in a 64,000 dimension vocabulary space and) their angle, and singular value decomposition, and turning URLs as long integers into 16 bytes random numbers by the Mersenne Twister (why random, except for encryption?), missing both the square in von Neumann’s first PRNG (p.124) and the opportunity to link the probability of overlap with the birthday problem (p.129). Followed by another chapter on cryptography, always a favourite in maths vulgarisation books (but with no mention made of the originators of public key cryptography, like James Hellis or the RSA trio, or of the impact of quantum computers on the reliability of these methods). And by an a-mathematic chapter on spam detection.

Another sequence of chapters cover maximum entropy models (in a rather incomprehensible way, I think, see p.159), continued with an interesting argument how Shannon’s first theorem predicts that it should be faster to type Chinese characters than Roman characters. Followed by the Bloom filter, which operates as an approximate Poisson variate. Then Bayesian networks where the “probability of any node is computed by Bayes’ formula” [not really]. With a slightly more advanced discussion on providing the highest posterior probability network. And conditional random fields, where the conditioning is not clearly discussed (p.192). Next are chapters about Viterbi’s algorithm (and successful career) and the EM algorithm, nicknamed “God’s algorithm” in the book (Chapter 26) although I never heard of this nickname previously.

The final two chapters are on neural networks and Big Data, clearly written later than the rest of the book, with the predictable illustration of AlphaGo (but without technical details). The twenty page chapter on Big Data does not contain a larger amount of mathematics, with no equation apart from Chebyshev’s inequality, and a frequency estimate for a conditional probability. But I learned about 23&me running genetic tests at a loss to build a huge (if biased) genetic database. (The bias in “Big Data” issues is actually not covered by this chapter.)

*“One of my main objectives for writing the book is to introduce some mathematical knowledge related to the IT industry to people who do not work in the industry.”*

To conclude, I found the book a fairly interesting insight on the vision of his field and job experience by a senior scientist at Google, with loads of anecdotes and some historical backgrounds, but very Google-centric and what I felt like an excessive amount of name dropping and of I did, I solved, I &tc. The title is rather misleading in my opinion as the amount of maths is very limited and rarely sufficient to connect with the subject at hand. Although this is quite a relative concept, I did not spot beauty therein but rather technical advances and trick, allowing the author and Google to beat the competition.

## Peter Lee (1940?-2017)

Posted in Books, pictures, R, Statistics, University life, Wines with tags Bayesian statisticians, Bayesian textbook, England, obituary, Peter Lee, R, York on March 12, 2017 by xi'an**J**ust heard the sad news that Peter Lee, British Bayesian and author of Bayesian Statistics: An Introduction, has passed away yesterday night. While I did not know him, I remember meeting him at a few conferences in the UK and spending an hilarious evening at the pub. When the book came out, I thought it was quite fine an introduction to Bayesian Statistics, with enough mathematical details and prerequisites to make it worthwhile studying, while also including computational recommendations. Fare thee well, Peter.

## Steve Fienberg’ obituary in Nature

Posted in Statistics with tags Carnegie Mellon University, Census, CMU, data privacy, National Academy of Science, Nature, obituary, polygraph, Steve Fienberg on March 10, 2017 by xi'an

“Stephen Fienberg was the ultimate public statistician.”

**R**obin Mejia from CMU published in the 23 Feb issue of Nature an obituary of Steve Fienberg that sums up beautifully Steve’s contributions to science and academia. I like the above quote very much, as indeed Steve was definitely involved in public policies, towards making those more rational and fair. I remember the time he came to Paris-Dauphine to give a seminar and talk on his assessment in a NAS committee on the polygraph (and my surprise at it being used at all in the US and even worse in judiciary issues). Similarly, I remember his involvement in making the US Census based on surveys rather than on an illusory exhaustive coverage of the entire US population. Including a paper in Nature about the importance of surveys. And his massive contributions to preserving privacy in surveys and databases, an issue in which he was a precursor (even though my colleagues at the French Census Bureau did not catch the opportunity when he spent a sabbatical in Paris in 2004). While it is such a sad circumstance that lead to statistics getting a rare entry in Nature, I am glad that Steve can also be remembered that way.

## Stephen Fienberg (1942-2016)

Posted in Statistics, University life with tags obituary, Steve Fienberg on December 14, 2016 by xi'an**I** am very very sad to have to announce that our dear friend Steve Fienberg passed away last night, after a long and admirable battle against cancer. He was a wonderful person, a brilliant statistician, a deep thinker, and a fantastic mentor to so many of us. He has strongly impacted the field of Statistics over his prolific career and continued to do so till the last day. It is just so hard to realise he is no longer with us. But his contagious laughter will continue to resonate in our memories, while his vision of Statistics will keep driving us. Au revoir, Steve, et merci.

## Wilfred Keith Hastings [1930-2016]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags Bell Labs, Biometrika, Canada, Julian Besag, Metropolis-Hastings algorithm, obituary, Peskun ordering, University of Canterbury, University of Victoria, Victoria, Wilfred Keith Hastings on December 9, 2016 by xi'an**A** few days ago I found on the page Jeff Rosenthal has dedicated to Hastings that he has passed away peacefully on May 13, 2016 in Victoria, British Columbia, where he lived for 45 years as a professor at the University of Victoria. After holding positions at University of Toronto, University of Canterbury (New Zealand), and Bell Labs (New Jersey). As pointed out by Jeff, Hastings’ main paper is his 1970 Biometrika description of Markov chain Monte Carlo methods, Monte Carlo sampling methods using Markov chains and their applications. Which would take close to twenty years to become known to the statistics world at large, although you can trace a path through Peskun (his only PhD student) , Besag and others. I am sorry it took so long to come to my knowledge and also sorry it apparently went unnoticed by most of the computational statistics community.