## a quincunx on NBC

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , , on December 3, 2017 by xi'an

Through Five-Thirty-Eight, I became aware of a TV game call The Wall [so appropriate for Trumpian times!] that is essentially based on Galton’s quincunx! A huge [15m!] high version of Galton’s quincunx, with seven possible starting positions instead of one, which kills the whole point of the apparatus which is to demonstrate by simulation the proximity of the Binomial distribution to the limiting Normal (density) curve.

But the TV game has obvious no interest in the CLT, or in the Beta binomial posterior, only in a visible sequence of binary events that turn out increasing or decreasing the money “earned” by the player, the highest sums being unsurprisingly less likely. The only decision made by the player is to pick one of the seven starting points (meaning the outcome should behave like a weighted sum of seven Normals with drifted means depending on the probabilities of choosing these starting points). I found one blog entry analysing an “idiot” strategy of playing the game, but not the entire game. (Except for this entry on the older Plinko.) And Five-Thirty-Eight surprisingly does not get into the optimal strategies to play this game (maybe because there is none!). Five-Thirty-Eight also reproduces the apocryphal quote of Laplace not requiring this [God] hypothesis.

[Note: When looking for a picture of the Quincunx, I also found this desktop version! Which “allows you to visualize the order embedded in the chaos of randomness”, nothing less. And has even obtain a patent for this “visual aid that demonstrates [sic] a random walk and generates [re-sic] a bell curve distribution”…]

## The Seven Pillars of Statistical Wisdom [book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on June 10, 2017 by xi'an

I remember quite well attending the ASA Presidential address of Stephen Stigler at JSM 2014, Boston, on the seven pillars of statistical wisdom. In connection with T.E. Lawrence’s 1926 book. Itself in connection with Proverbs IX:1. Unfortunately wrongly translated as seven pillars rather than seven sages.

As pointed out in the Acknowledgements section, the book came prior to the address by several years. I found it immensely enjoyable, first for putting the field in a (historical and) coherent perspective through those seven pillars, second for exposing new facts and curios about the history of statistics, third because of a literary style one would wish to see more often in scholarly texts and of a most pleasant design (and the list of reasons could go on for quite a while, one being the several references to Jorge Luis Borges!). But the main reason is to highlight the unified nature of Statistics and the reasons why it does not constitute a subfield of either Mathematics or Computer Science. In these days where centrifugal forces threaten to split the field into seven or more disciplines, the message is welcome and urgent.

Here are Stephen’s pillars (some comments being already there in the post I wrote after the address):

1. aggregation, which leads to gain information by throwing away information, aka the sufficiency principle. One (of several) remarkable story in this section is the attempt by Francis Galton, never lacking in imagination, to visualise the average man or woman by superimposing the pictures of several people of a given group. In 1870!
2. information accumulating at the √n rate, aka precision of statistical estimates, aka CLT confidence [quoting  de Moivre at the core of this discovery]. Another nice story is Newton’s wardenship of the English Mint, with musing about [his] potential exploiting this concentration to cheat the Mint and remain undetected!
3. likelihood as the right calibration of the amount of information brought by a dataset [including Bayes’ essay as an answer to Hume and Laplace’s tests] and by Fisher in possible the most impressive single-handed advance in our field;
4. intercomparison [i.e. scaling procedures from variability within the data, sample variation], from Student’s [a.k.a., Gosset‘s] t-test, better understood and advertised by Fisher than by the author, and eventually leading to the bootstrap;
5. regression [linked with Darwin’s evolution of species, albeit paradoxically, as Darwin claimed to have faith in nothing but the irrelevant Rule of Three, a challenging consequence of this theory being an unobserved increase in trait variability across generations] exposed by Darwin’s cousin Galton [with a detailed and exhilarating entry on the quincunx!] as conditional expectation, hence as a true Bayesian tool, the Bayesian approach being more specifically addressed in (on?) this pillar;
6. design of experiments [re-enters Fisher, with his revolutionary vision of changing all factors in Latin square designs], with an fascinating insert on the 18th Century French Loterie,  which by 1811, i.e., during the Napoleonic wars, provided 4% of the national budget!;
7. residuals which again relate to Darwin, Laplace, but also Yule’s first multiple regression (in 1899), Fisher’s introduction of parametric models, and Pearson’s χ² test. Plus Nightingale’s diagrams that never cease to impress me.

The conclusion of the book revisits the seven pillars to ascertain the nature and potential need for an eight pillar.  It is somewhat pessimistic, at least my reading of it was, as it cannot (and presumably does not want to) produce any direction about this new pillar and hence about the capacity of the field of statistics to handle in-coming challenges and competition. With some amount of exaggeration (!) I do hope the analogy of the seven pillars that raises in me the image of the beautiful ruins of a Greek temple atop a Sicilian hill, in the setting sun, with little known about its original purpose, remains a mere analogy and does not extend to predict the future of the field! By its very nature, this wonderful book is about foundations of Statistics and therefore much more set in the past and on past advances than on the present, but those foundations need to move, grow, and be nurtured if the field is not to become a field of ruins, a methodology of the past!

## Tractable Fully Bayesian inference via convex optimization and optimal transport theory

Posted in Books, Statistics, University life with tags , , , , , , , , on October 6, 2015 by xi'an

“Recently, El Moselhy et al. proposed a method to construct a map that pushed forward the prior measure to the posterior measure, casting Bayesian inference as an optimal transport problem. Namely, the constructed map transforms a random variable distributed according to the prior into another random variable distributed according to the posterior. This approach is conceptually different from previous methods, including sampling and approximation methods.”

Yesterday, Kim et al. arXived a paper with the above title, linking transport theory with Bayesian inference. Rather strangely, they motivate the transport theory with Galton’s quincunx, when the apparatus is a discrete version of the inverse cdf transform… Of course, in higher dimensions, there is no longer a straightforward transform and the paper shows (or recalls) that there exists a unique solution with positive Jacobian for log-concave posteriors. For instance, log-concave priors and likelihoods. This solution remains however a virtual notion in practice and an approximation is constructed via a (finite) functional polynomial basis. And minimising an empirical version of the Kullback-Leibler distance.

I am somewhat uncertain as to how and why apply such a transform to simulations from the prior (which thus has to be proper). Producing simulations from the posterior certainly is a traditional way to approximate Bayesian inference and this is thus one approach to this simulation. However, the discussion of the advantage of this approach over, say, MCMC, is quite limited. There is no comparison with alternative simulation or non-simulation methods and the computing time for the transport function derivation. And on the impact of the dimension of the parameter space on the computing time. In connection with recent discussions on probabilistic numerics and super-optimal convergence rates, Given that it relies on simulations, I doubt optimal transport can do better than O(√n) rates. One side remark about deriving posterior credible regions from (HPD)  prior credible regions: there is no reason the resulting region is optimal in volume (HPD) given that the transform is non-linear.

## the quincunx [book review]

Posted in Books, Kids, Statistics with tags , , , , , on July 1, 2013 by xi'an

“How then may we become free? Only by harmonising ourselves with the randomness of life through the untrammelled operation of the market.”

This is a 1989 book that I read about that time and had not re-read till last month…. The Quincunx is a parody of several of Charles Dickens’ novels, written by another Charles, Charles Palliser, far into the 20th Century. The name is obviously what attracted me first to this book, since it reminded me of Francis Galton’s amazing mechanical simulation device. Of course, there is nothing in the book that relates to Galton and its quincunx!

“Your employer has been speculating in bills with the company’s capital and, as you’ll conclude in the present panic, he has lost heavily. There’s no choice now but to declare the company bankrupt. And when that happens, the creditors will put you in Marshalsea.”

As I am a big fan of Dickens, I went through The Quincunx as an exercise in Dickensania, trying to spot characters and settings from the many books written by Dickens. I found connections with Great Expectations (for the John-Henrietta couple and the fantastic features in the thieves’ den, but also encounters with poverty and crime), Bleak House (for the judicial intricacies), Little Dorrit (for the jail system and the expectation of inheritance), Our Mutual Friend (for the roles of the Thames, of money, forced weddings),  Martin Chuzzlewit (again for complex inheritance stories), Oliver Twist (for the gangs of thieves, usury, the private “schools” and London underworld), David Copperfield (for the somehow idiotic mother and the fall into poverty), The Mystery of Edwin Drood (for the murder, of course!) And I certainly missed others. (Some literary critics wrote that Palliser managed to write all Dickens at once.)

“I added to the mixture a badly bent George II guinea which was the finest of all the charms.”

However, despite the perfect imitation in style, with its array of grotesque characters and unbelievable accidents, using Dickens’ irony and tongue-in-cheek circumlocutions, with maybe an excess of deliberate misspellings, Palliser delivers a much bleaker picture of Dickens’ era than Dickens himself. This was the worst of times, if any, where some multifaceted unbridled capitalism makes use of the working class through cheap salaries, savage usury, and overpriced (!) slums, forcing women into prostitution, men into cemetery desecration and sewage exploration. There is no redemption at any point in Palliser’s world and the reader is left with the impression that the central character John Huffam (it would be hard to call him the hero of The Quincunx) is about to fall into the same spiral of debt and legal swindles as his complete family tree.  A masterpiece. (Even though I do not buy the postmodern thread.)

## appliBUGS (wet)

Posted in Statistics, University life with tags , , , , , , , , , on December 27, 2012 by xi'an

This morning I gave my talk on ABC; computation or inference? at the appliBUGS seminar. Here, in Paris, BUGS stands for Bayesian United Group of Statisticians! Presumably in connection with a strong football culture, since the talk after mine was Jean-Louis Foulley’s ranking of the Euro 2012 teams. Quite an interesting talk (even though I am not particularly interested in football and even though I dozed a little, steaming out the downpour I had received on my bike-ride there…) I am also sorry I missed the next talk by Jean-Louis on Galton’s quincunx. (Unfortunately, his slides are not [yet?] on-line.)

As a coincidence, after launching a BayesComp page on Google+ (as an aside, I am quite nonplussed by the purpose of Google-), Nicolas Chopin also just started a Bayes in Paris webpage, in connection with our informal seminar/reading group at CREST. With the appropriate picture this time, i.e. a street plaque remembering…Laplace! May I suggest the RER stop Laplace and his statue in the Paris observatory as additional illustrations for the other pages…

## Galton & simulation

Posted in Books, R, Statistics with tags , , , , , , , , on September 28, 2010 by xi'an

Stephen Stigler has written a paper in the Journal of the Royal Statistical Society Series A on Francis Galton’s analysis of (his cousin) Charles Darwin’ Origin of Species, leading to nothing less than Bayesian analysis and accept-reject algorithms!

“On September 10th, 1885, Francis Galton ushered in a new era of Statistical Enlightenment with an address to the British Association for the Advancement of Science in Aberdeen. In the process of solving a puzzle that had lain dormant in Darwin’s Origin of Species, Galton introduced multivariate analysis and paved the way towards modern Bayesian statistics. The background to this work is recounted, including the recognition of a failed attempt by Galton in 1877 as providing the first use of a rejection sampling algorithm for the simulation of a posterior distribution, and the first appearance of a proper Bayesian analysis for the normal distribution.”

The point of interest is that Galton proposes through his (multi-stage) quincunx apparatus a way to simulate from the posterior of a normal mean (here is an R link to the original quincunx). This quincunx has a vertical screen at the second level that acts as a way to physically incorporate the likelihood (it also translates the fact that the likelihood is in another “orthogonal” space, compared  with the prior!):

“Take another look at Galton’s discarded 1877 model for natural selection (Fig. 6). It is nothing less that a workable simulation algorithm for taking a normal prior (the top level) and a normal likelihood (the natural selection vertical screen) and finding a normal posterior (the lower level, including the rescaling as a probability density with the thin front compartment of uniform thickness).”

Besides a simulation machinery (steampunk Monte Carlo?!), it also offers the enormous appeal of proposing the derivation of the normal-normal posterior for the very first time:

“Galton was not thinking in explicit Bayesian terms, of course, but mathematically he has posterior $\mathcal{N}(0,C_2)\propto\mathcal{N}(0,A_2)\times f(x=0|y)$. This may be the earliest appearance of this calculation; the now standard derivation of a posterior distribution in a normal setting with a proper normal prior. Galton gave the general version of this result as part of his 1885 development, but the 1877 version can be seen as an algorithm employing rejection sampling that could be used for the generation of values from a posterior distribution. If we replace $f(x)$ above by the density $\mathcal{N}(a,B_2)$, his algorithm would generate the posterior distribution of Y given X=a, namely $\mathcal{N}(aC_2/B_2, C_2)$. The assumption of normality is of course needed for the particular formulae here, but as an algorithm the normality is not essential; posterior values for any prior and any location parameter likelihood could in principle be generated by extending this algorithm.” Continue reading