Archive for vulgarisation

the beauty of maths in computer science [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on January 17, 2019 by xi'an

CRC Press sent me this book for review in CHANCE: Written by Jun Wu, “staff research scientist in Google who invented Google’s Chinese, Japanese, and Korean Web search algorithms”, and translated from the Chinese, 数学之美, originating from Google blog entries. (Meaning most references are pre-2010.) A large part of the book is about word processing and web navigation, which is the author’s research specialty. And not so much about mathematics. (When rereading the first chapters to start this review I then realised why the part about language processing in AIQ sounded familiar: I had read it in the Beauty of Mathematics in Computer Science.)

In the first chapter, about the history of languages, I found out, among other things, that ancient Jewish copists of the Bible had an error correcting algorithm consisting in giving each character a numerical equivalent, summing up each row, then all rows, and  checking the sum at the end of the page was the original one. The second chapter explains why the early attempts at language computer processing, based on grammar rules, were unsuccessful and how a statistical approach had broken the blockade. Explained via Markov chains in the following chapter. Along with the Good-Turing [Bayesian] estimate of the transition probabilities. Next comes a short and low-tech chapter on word segmentation. And then an introduction to hidden Markov models. Mentioning the Baum-Welch algorithm as a special case of EM, which makes a return by Chapter 26. Plus a chapter on entropies and Kullback-Leibler divergence.

A first intermede is provided by a chapter dedicated to the late Frederick Jelinek, the author’s mentor (including what I find a rather unfortunate equivalent drawn between the Nazi and Communist eras in Czechoslovakia, p.64). Chapter that sounds a wee bit too much like an extended obituary.

The next section of chapters is about search engines, with a few pages on Boolean logic, dynamic programming, graph theory, Google’s PageRank and TF-IDF (term frequency/inverse document frequency). Unsurprisingly, given that the entries were originally written for Google’s blog, Google’s tools and concepts keep popping throughout the entire book.

Another intermede about Amit Singhal, the designer of Google’s internal search ranking system, Ascorer. With another unfortunate equivalent with the AK-47 Kalashnikov rifle as “elegantly simple”, “effective, reliable, uncomplicated, and easy to implement or operate” (p.105). Even though I do get the (reason for the) analogy, using an equivalent tool which purpose is not to kill other people would have been just decent…

Then chapters on measuring proximity between news articles by (vectors in a 64,000 dimension vocabulary space and) their angle, and singular value decomposition, and turning URLs as long integers into 16 bytes random numbers by the Mersenne Twister (why random, except for encryption?), missing both the square in von Neumann’s first PRNG (p.124) and the opportunity to link the probability of overlap with the birthday problem (p.129). Followed by another chapter on cryptography, always a favourite in maths vulgarisation books (but with no mention made of the originators of public key cryptography, like James Hellis or the RSA trio, or of the impact of quantum computers on the reliability of these methods). And by an a-mathematic chapter on spam detection.

Another sequence of chapters cover maximum entropy models (in a rather incomprehensible way, I think, see p.159), continued with an interesting argument how Shannon’s first theorem predicts that it should be faster to type Chinese characters than Roman characters. Followed by the Bloom filter, which operates as an approximate Poisson variate. Then Bayesian networks where the “probability of any node is computed by Bayes’ formula” [not really]. With a slightly more advanced discussion on providing the highest posterior probability network. And conditional random fields, where the conditioning is not clearly discussed (p.192). Next are chapters about Viterbi’s algorithm (and successful career) and the EM algorithm, nicknamed “God’s algorithm” in the book (Chapter 26) although I never heard of this nickname previously.

The final two chapters are on neural networks and Big Data, clearly written later than the rest of the book, with the predictable illustration of AlphaGo (but without technical details). The twenty page chapter on Big Data does not contain a larger amount of mathematics, with no equation apart from Chebyshev’s inequality, and a frequency estimate for a conditional probability. But I learned about 23&me running genetic tests at a loss to build a huge (if biased) genetic database. (The bias in “Big Data” issues is actually not covered by this chapter.)

“One of my main objectives for writing the book is to introduce some mathematical knowledge related to the IT industry to people who do not work in the industry.”

To conclude, I found the book a fairly interesting insight on the vision of his field and job experience by a senior scientist at Google, with loads of anecdotes and some historical backgrounds, but very Google-centric and what I felt like an excessive amount of name dropping and of I did, I solved, I &tc. The title is rather misleading in my opinion as the amount of maths is very limited and rarely sufficient to connect with the subject at hand. Although this is quite a relative concept, I did not spot beauty therein but rather technical advances and trick, allowing the author and Google to beat the competition.

a book by C.Robert [not a book review]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on December 10, 2018 by xi'an

data is everywhere

Posted in Kids, pictures, Statistics, University life with tags , , , , , , , , on November 25, 2018 by xi'an

a trip back in time [and in Rouen]

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on June 24, 2017 by xi'an

On Monday, I took part in a celebration of the remarkable career of a former colleague of mine in Rouen, Gérard Grancher, who is retiring after a life-long position as CNRS engineer in the department of maths of the University of Rouen, a job title that tells very little about the numerous facets of his interactions with mathematics, from his handling of all informatics aspects in the laboratory to his support of all colleagues there, including fresh PhD students like me in 1985!, to his direction of the CNRS lab in 2006 and 2007 at a time of deep division and mistrust, to his numerous collaborations on statistical projects with local actors, to his Norman federalism in bringing the maths departments of Caen and Rouen into a regional federation, to an unceasing activism to promote maths in colleges and high schools and science fairs all around Normandy, to his contributions to professional training in statistics for CNRS agents, and much, much more… Which explains why the science auditorium of the University of Rouen was packed with mathematicians and high schools maths teachers and friends! (The poster of the day was made by Gérard’s accomplices in vulgarisation, Élise Janvresse and Thierry Delarue, based on a sample of points randomly drawn from Gérard’s picture, maybe using a determinantal process, and the construction of a travelling salesman path over those points.)

This was a great day with mostly vulgarisation talks (including one about Rasmus’ socks..!) and reminiscences about Gérard’s carreer at Rouen. As I had left the university in 2000 to move to Paris-Dauphine, this was a moving day as well, as I met with old friends I had not seen for ages, including our common PhD advisor, Jean-Pierre Raoult.

This trip back in time was also an opportunity to (re-)visit the beautifully preserved medieval centre of Rouen, with its wooden houses, Norman-style, the numerous churches, including Monet‘s cathedral, the Justice Hall… Last time I strolled those streets, George Casella was visiting!

Théorème vivant

Posted in Books, University life with tags , , , , , , , on November 7, 2012 by xi'an

When I ordered this book, Théorème Vivant (Alive Theorem), by Cédric Villani, I had misgivings about it being yet another illustration of the, pardon my French!, universal “pipolisation” process that turns values upside down and sets mundane aspects of major contemporary figures above their true achievements like, say, winning a Fields medal! However, as soon as I started reading Théorème Vivant, I realised it was a fascinating delve into the way mathematicians operate and how they build theorems. Of course, as an “insider”, I can find many entry points to relate to, some quite mundane and unrelated like entering the common room of a conference centre in the middle of the night to “steal” some life-saving tea bags or an aversion to taxi rides, not mentioning an addiction to French cheeses… And I have the advantage of being able to read the math formulas given in the book (even though this is not at all my area of expertise and I find the wording of the theorems and proofs rather unusual at times). But I think Théorème Vivant can be read by non-mathematicians as well, provided they take those formulas and paper extracts as pictures, just like the drawings of mathematicians interspeded throughout the book and do not get annoyed at not understanding the meaning of them (I do not get the deepest levels either!). Nothing to be afraid of: Théorème Vivant is another impressive illustration of the ability of Cédric Villani to explain mathematics to the general public and to surf upon his popularity with the medias. (The book is currently available in French only, but should soon be translated into English. Possibly polishing the least politically correct statements…) Continue reading

simulation, a ubiquitous tool

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , on July 10, 2012 by xi'an

After struggling for quite a while on that AMSI public lecture talk, and dreading its loss with the problematic Macbook, I managed to complete a first draft last night in Adelaide, downloading [at high financial cost!] a final set of images from the Web (plus a few personal ones, like a picture of my son’s Warhammer figurines!). Having very few inklings about the level and the expectations of the audience (if any!), I cannot say if this introduction to simulation is too basic. I will see after the first talk whether or not the aim was off-target… In any case, I am glad I was forced into writing this talk as I had always wanted to have a general-audience introduction to simulation at the ready and can now recycle it easily when and if needed. I will certainly use it in my R class this semester.