The Seven Pillars of Statistical Wisdom [book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on June 10, 2017 by xi'an

I remember quite well attending the ASA Presidential address of Stephen Stigler at JSM 2014, Boston, on the seven pillars of statistical wisdom. In connection with T.E. Lawrence’s 1926 book. Itself in connection with Proverbs IX:1. Unfortunately wrongly translated as seven pillars rather than seven sages.

As pointed out in the Acknowledgements section, the book came prior to the address by several years. I found it immensely enjoyable, first for putting the field in a (historical and) coherent perspective through those seven pillars, second for exposing new facts and curios about the history of statistics, third because of a literary style one would wish to see more often in scholarly texts and of a most pleasant design (and the list of reasons could go on for quite a while, one being the several references to Jorge Luis Borges!). But the main reason is to highlight the unified nature of Statistics and the reasons why it does not constitute a subfield of either Mathematics or Computer Science. In these days where centrifugal forces threaten to split the field into seven or more disciplines, the message is welcome and urgent.

Here are Stephen’s pillars (some comments being already there in the post I wrote after the address):

1. aggregation, which leads to gain information by throwing away information, aka the sufficiency principle. One (of several) remarkable story in this section is the attempt by Francis Galton, never lacking in imagination, to visualise the average man or woman by superimposing the pictures of several people of a given group. In 1870!
2. information accumulating at the √n rate, aka precision of statistical estimates, aka CLT confidence [quoting  de Moivre at the core of this discovery]. Another nice story is Newton’s wardenship of the English Mint, with musing about [his] potential exploiting this concentration to cheat the Mint and remain undetected!
3. likelihood as the right calibration of the amount of information brought by a dataset [including Bayes’ essay as an answer to Hume and Laplace’s tests] and by Fisher in possible the most impressive single-handed advance in our field;
4. intercomparison [i.e. scaling procedures from variability within the data, sample variation], from Student’s [a.k.a., Gosset‘s] t-test, better understood and advertised by Fisher than by the author, and eventually leading to the bootstrap;
5. regression [linked with Darwin’s evolution of species, albeit paradoxically, as Darwin claimed to have faith in nothing but the irrelevant Rule of Three, a challenging consequence of this theory being an unobserved increase in trait variability across generations] exposed by Darwin’s cousin Galton [with a detailed and exhilarating entry on the quincunx!] as conditional expectation, hence as a true Bayesian tool, the Bayesian approach being more specifically addressed in (on?) this pillar;
6. design of experiments [re-enters Fisher, with his revolutionary vision of changing all factors in Latin square designs], with an fascinating insert on the 18th Century French Loterie,  which by 1811, i.e., during the Napoleonic wars, provided 4% of the national budget!;
7. residuals which again relate to Darwin, Laplace, but also Yule’s first multiple regression (in 1899), Fisher’s introduction of parametric models, and Pearson’s χ² test. Plus Nightingale’s diagrams that never cease to impress me.

The conclusion of the book revisits the seven pillars to ascertain the nature and potential need for an eight pillar.  It is somewhat pessimistic, at least my reading of it was, as it cannot (and presumably does not want to) produce any direction about this new pillar and hence about the capacity of the field of statistics to handle in-coming challenges and competition. With some amount of exaggeration (!) I do hope the analogy of the seven pillars that raises in me the image of the beautiful ruins of a Greek temple atop a Sicilian hill, in the setting sun, with little known about its original purpose, remains a mere analogy and does not extend to predict the future of the field! By its very nature, this wonderful book is about foundations of Statistics and therefore much more set in the past and on past advances than on the present, but those foundations need to move, grow, and be nurtured if the field is not to become a field of ruins, a methodology of the past!

Cauchy Distribution: Evil or Angel?

Posted in Books, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , on May 19, 2015 by xi'an

Natesh Pillai and Xiao-Li Meng just arXived a short paper that solves the Cauchy conjecture of Drton and Xiao [I mentioned last year at JSM], namely that, when considering two normal vectors with generic variance matrix S, a weighted average of the ratios X/Y remains Cauchy(0,1), just as in the iid S=I case. Even when the weights are random. The fascinating side of this now resolved (!) conjecture is that the correlation between the terms does not seem to matter. Pushing the correlation to one [assuming it is meaningful, which is a suspension of belief!, since there is no standard correlation for Cauchy variates] leads to a paradox: all terms are equal and yet… it works: we recover a single term, which again is Cauchy(0,1). All that remains thus to prove is that it stays Cauchy(0,1) between those two extremes, a weird kind of intermediary values theorem!

Actually, Natesh and XL further prove an inverse χ² theorem: the inverse of the normal vector, renormalised into a quadratic form is an inverse χ² no matter what its covariance matrix. The proof of this amazing theorem relies on a spherical representation of the bivariate Gaussian (also underlying the Box-Müller algorithm). The angles are then jointly distributed as

$\exp\{-\sum_{i,j}\alpha_{ij}\cos(\theta_i-\theta_j)\}$

and from there follows the argument that conditional on the differences between the θ’s, all ratios are Cauchy distributed. Hence the conclusion!

A question that stems from reading this version of the paper is whether this property extends to other formats of non-independent Cauchy variates. Somewhat connected to my recent post about generating correlated variates from arbitrary distributions: using the inverse cdf transform of a Gaussian copula shows this is possibly the case: the following code is meaningless in that the empirical correlation has no connection with a “true” correlation, but nonetheless the experiment seems of interest…

> ro=.999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2)
> cor(x[,1]/x[,2],y[,1]/y[,2])
[1] -0.1351967
> ro=.99999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2)
> cor(x[,1]/x[,2],y[,1]/y[,2])
[1] 0.8622714
> ro=1-1e-5;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2)
> z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y)))
> cor(x=z,y=w)
[1] 0.9999732
> ks.test((z+w)/2,"pcauchy")

One-sample Kolmogorov-Smirnov test

data:  (z + w)/2
D = 0.0068, p-value = 0.3203
alternative hypothesis: two-sided
> ro=1-1e-3;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2)
> z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y)))
> cor(x=z,y=w)
[1] 0.9920858
> ks.test((z+w)/2,"pcauchy")

One-sample Kolmogorov-Smirnov test

data:  (z + w)/2
D = 0.0036, p-value = 0.9574
alternative hypothesis: two-sided

The Unimaginable Mathematics of Borges’ Library of Babel [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on September 30, 2014 by xi'an

This is a book I carried away from JSM in Boston as the Oxford University Press representative kindly provided my with a copy at the end of the meeting. After I asked for it, as I was quite excited to see a book linking Jorge Luis Borges’ great Library of Babel short story with mathematical concepts. Even though many other short stories by Borges have a mathematical flavour and are bound to fascinate mathematicians, the Library of Babel is particularly prone to mathemati-sation as it deals with the notions of infinite, periodicity, permutation, randomness… As it happens, William Goldbloom Bloch [a patronym that would surely have inspired Borges!], professor of mathematics at Wheaton College, Mass., published the unimaginable mathematics of Borges’ Library of Babel in 2008, so this is not a recent publication. But I had managed to miss through the several conferences where I stopped at OUP exhibit booth. (Interestingly William Bloch has also published a mathematical paper on Neil Stephenson’s Cryptonomicon.)

Now, what is unimaginable in the maths behind Borges’ great Library of Babel??? The obvious line of entry to the mathematical aspects of the book is combinatorics: how many different books are there in total? [Ans. 10¹⁸³⁴⁰⁹⁷…] how many hexagons are needed to shelf that many books? [Ans. 10⁶⁸¹⁵³¹…] how long would it take to visit all those hexagons? how many librarians are needed for a Library containing all volumes once and only once? how many different libraries are there [Ans. 1010⁶…] Then the book embarks upon some cohomology, Cavalieri’s infinitesimals (mentioned by Borges in a footnote), Zeno’s paradox, topology (with Klein’s bottle), graph theory (and the important question as to whether or not each hexagon has one or two stairs), information theory, Turing’s machine. The concluding chapters are comments about other mathematical analysis of Borges’ Grand Œuvre and a discussion on how much maths Borges knew.

So a nice escapade through some mathematical landscapes with more or less connection with the original masterpiece. I am not convinced it brings any further dimension or insight about it, or even that one should try to dissect it that way, because it kills the poetry in the story, especially the play around the notion(s) of infinite. The fact that the short story is incomplete [and short on details] makes its beauty: if one starts wondering at the possibility of the Library or at the daily life of the librarians [like, what do they eat? why are they there? where are the readers? what happens when they die? &tc.] the intrusion of realism closes the enchantment! Nonetheless, the unimaginable mathematics of Borges’ Library of Babel provides a pleasant entry into some mathematical concepts and as such may initiate a layperson not too shy of maths formulas to the beauty of mathematics.

Boston skyline (#3)

Posted in pictures, Running, Travel, University life with tags , , , , on August 18, 2014 by xi'an

STEM forums

Posted in Books, R, Statistics, University life with tags , , , , , on August 15, 2014 by xi'an

“I can calculate the movement of stars, but not the madness of men.” Isaac Newton

When visiting the exhibition hall at JSM 2014, I spoke with people from STEM forums on the Springer booth. The concept of STEM (why STEM? Nothing to do with STAN! Nor directly with Biology. It stands as the accronym for Science, Technology, Engineering, and Mathematics.) is to create a sort of peer-reviewed Cross Validated where questions would be filtered (in order to avoid the most basic questions like “How can I learn about Bayesian statistics without opening a book?” or “What is the Binomial distribution?” that often clutter the Stack Exchange boards). That’s an interesting approach which I will monitor in the future, as on the one hand, it would be nice to have a Statistics forum without “lazy undergraduate” questions as one of my interlocutors put, and on the other hand, to see how STEM forums can compete with the well-established Cross Validated and its core of dedicated moderators and editors. I left the booth with a neat tee-shirt exhibiting the above quote as well as alpha-tester on the back: STEM forums is indeed calling for entries into the Statistics section, with rewards of ebooks for the first 250 entries and a sweepstakes offering a free trip to Seattle next year!