**T**here was a major fire near my house yesterday with many fire-engines rushing by and a wet smoke smell lingering by the whole night. As I found out during my early morning run, the nearby chocolate factory had completely burned. Actually, sixteen hours after the beginning of the fire, the building was still smouldering, with a dozen fire-engines yet on site and huge hoses running on adjacent streets. A fireman told me the fire had started from an electric spark and that the entire reserves had been destroyed. This is quite sad, as hitting a local business and a great chocolate maker, Patrick Roger. I do not know whether or not the company will survive this disaster, but if you happen to come by one of the shops in Paris or Brussels, drop in and buy some chocolates! For the taste of it and as a support.

## The chocolate factory gone up in smoke

Posted in Kids, Running with tags chocolate, fire, Patrick Roger, Sceaux on September 30, 2014 by xi'an## The Unimaginable Mathematics of Borges’ Library of Babel [book review]

Posted in Books, Statistics, Travel, University life with tags book review, Boston, cohomology, combinatorics, infinity, information theory, Jorge Luis Borges, JSM 2014, Library of Babel, Oxford University Press, Turing's machine on September 30, 2014 by xi'an**T**his is a book I carried away from JSM in Boston as the Oxford University Press representative kindly provided my with a copy at the end of the meeting. After I asked for it, as I was quite excited to see a book linking Jorge Luis Borges’ great Library of Babel short story with mathematical concepts. Even though many other short stories by Borges have a mathematical flavour and are bound to fascinate mathematicians, the Library of Babel is particularly prone to mathemati-sation as it deals with the notions of infinite, periodicity, permutation, randomness… As it happens, William Goldbloom Bloch [a patronym that would surely have inspired Borges!], professor of mathematics at Wheaton College, Mass., published the unimaginable mathematics of Borges’ Library of Babel in 2008, so this is not a recent publication. But I had managed to miss through the several conferences where I stopped at OUP exhibit booth. (Interestingly William Bloch has also published a mathematical paper on Neil Stephenson’s Cryptonomicon.)

**N**ow, what is unimaginable in the maths behind Borges’ great Library of Babel??? The obvious line of entry to the mathematical aspects of the book is combinatorics: how many different books are there in total? [Ans. 10¹⁸³⁴⁰⁹⁷...] how many hexagons are needed to shelf that many books? [Ans. 10⁶⁸¹⁵³¹...] how long would it take to visit all those hexagons? how many librarians are needed for a Library containing all volumes once and only once? how many different libraries are there [Ans. 10^{10⁶}...] Then the book embarks upon some cohomology, Cavalieri’s infinitesimals (mentioned by Borges in a footnote), Zeno’s paradox, topology (with Klein’s bottle), graph theory (and the important question as to whether or not each hexagon has one or two stairs), information theory, Turing’s machine. The concluding chapters are comments about other mathematical analysis of Borges’ Grand Œuvre and a discussion on how much maths Borges knew.

**S**o a nice escapade through some mathematical landscapes with more or less connection with the original masterpiece. I am not convinced it brings any further dimension or insight about it, or even that one should try to dissect it that way, because it kills the poetry in the story, especially the play around the notion(s) of infinite. The fact that the short story is incomplete [and short on details] makes its beauty: if one starts wondering at the possibility of the Library or at the daily life of the librarians [like, what do they eat? why are they there? where are the readers? what happens when they die? &tc.] the intrusion of realism closes the enchantment! Nonetheless, the unimaginable mathematics of Borges’ Library of Babel provides a pleasant entry into some mathematical concepts and as such may initiate a layperson not too shy of maths formulas to the beauty of mathematics.

## future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags ABC, Apple II, approximation, BUGS, computational statistics, expectation-propagation, JAGS, MCMC, MCMSki IV, Monte Carlo, optimisation, STAN, statistical computing, sunset, variational Bayes methods on September 29, 2014 by xi'anI am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision… A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics…

## redshirts

Posted in Books, pictures, Travel with tags Birmingham, England, Hugo Awards, John Scalzi, Patrick Rothfuss, redshirts, Star Trek on September 28, 2014 by xi'an

“For the first nine years of its existence, aside from being appointed the flagship, there was nothing particularly special about it, from a statistical point of view.”

**A** book I grabbed at the last minute in a bookstore, downtown Birmingham. Maybe I should have waited this extra minute… Or picked the other Scalzi’s on the shelf, * Lock In* that just came out! (I already ordered that one for my incomiing lecture in Gainesville. Along with the

**final volume of Patrick Rothfuss’ masterpiece, The Slow Regard of Silent Things, which will just be out by then! It is only a side story within the same universe, as pointed out by Dan…)**

*not*

“What you’re trying to do is impose causality on random events, just like everyone else here has been doing.”

**W**hat amazes most me is that Scalzi’s *redshirts* got the 2013 Hugo Award. I mean, The Hugo Award?! While I definitely liked the Old Man Wars saga, this novel is more like a light writing experiment and a byproduct of writing a TV series. Enjoyable at a higher conceptual level, but not as a story. Although this is somewhat of a spoiler (!), the title refers to the characters wearing red shirts in Star Trek, who have a statistically significant tendency to die on the next mission. [Not that I knew this when I bought the book! Maybe it would have warned me against the book.] And *redshirts* is about those characters reflecting about how unlikely their fate is (or rather the fate of the characters before them) and rebelling against the series writer. Ensues games with the paradoxes of space travel and doubles. Then games within games. The book is well-written and, once again, enjoyable at some level, with alternative writing styles used in different parts (or coda) of the novel. It still remains a purely intellectual perspective, with no psychological involvement towards those characters. I just cannot relate to the story. Maybe because of the pastiche aspect or of the mostly comic turn. *redshirts* certainly feels very different from those Philip K. Dick stories (e.g., Ubik) where virtual realities abounded without a definitive conclusion on which was which.

## métro static

Posted in pictures, Running with tags Badwater Ultramarathon, Florida, Marathon FL, métro, métro static, Paris, sea, sunset on September 27, 2014 by xi'an## all models are wrong

Posted in Statistics, University life with tags ABC, Bayes factor, Bayesian model choice, George Box, model posterior probabilities, Molecular Ecology, phylogenetic model, phylogeography on September 27, 2014 by xi'an

“Using ABC to evaluate competing models has various hazards and comes with recommended precautions (Robert et al. 2011), and unsurprisingly, many if not most researchers have a healthy scepticism as these tools continue to mature.”

**M**ichael Hickerson just published an open-access letter with the above title in Molecular Ecology. (As in several earlier papers, incl. the (in)famous ones by Templeton, Hickerson confuses running an ABC algorithm with conducting Bayesian model comparison, but this is not the main point of this post.)

“Rather than using ABC with weighted model averaging to obtain the three corresponding posterior model probabilities while allowing for the handful of model parameters (θ, τ, γ, Μ) to be estimated under each model conditioned on each model’s posterior probability, these three models are sliced up into 143 ‘submodels’ according to various parameter ranges.”

**T**he letter is in fact a supporting argument for the earlier paper of Pelletier and Carstens (2014, Molecular Ecology) which conducted the above splitting experiment. I could not read this paper so cannot judge of the relevance of splitting this way the parameter range. From what I understand it amounts to using mutually exclusive priors by using different supports.

“Specifically, they demonstrate that as greater numbers of the 143 sub-models areevaluated, the inference from their ABC model choice procedure becomes increasingly.”

**A**n interestingly cut sentence. Increasingly unreliable? mediocre? weak?

“…with greater numbers of models being compared, the most probable models are assigned diminishing levels of posterior probability. This is an expected result…”

**T**rue, if the number of models under consideration increases, under a uniform prior over model indices, the posterior probability of a given model mechanically decreases. But the pairwise Bayes factors should not be impacted by the number of models under comparison and the letter by Hickerson states that Pelletier and Carstens found the opposite:

“…pairwise Bayes factor[s] will always be more conservative except in cases when the posterior probabilities are equal for all models that are less probable than the most probable model.”

**W**hich means that the “Bayes factor” in this study is computed as the ratio of a marginal likelihood and of a compound (or super-marginal) likelihood, averaged over all models and hence incorporating the prior probabilities of the model indices as well. I had never encountered such a proposal before. Contrary to the letter’s claim:

“…using the Bayes factor, incorporating all models is perhaps more consistent with the Bayesian approach of incorporating all uncertainty associated with the ABC model choice procedure.”

**B**esides the needless inclusion of ABC in this sentence, a somewhat confusing sentence, as Bayes factors are not, *stricto sensu*, Bayesian procedures since they remove the prior probabilities from the picture.

“Although the outcome of model comparison with ABC or other similar likelihood-based methods will always be dependent on the composition of the model set, and parameter estimates will only be as good as the models that are used, model-based inference provides a number of benefits.”

**A**ll models are wrong but the very fact that they are models allows for producing pseudo-data from those models and for checking if the pseudo-data is similar enough to the observed data. In components that matters the most for the experimenter. Hence a loss function of sorts…