Archive for quincunx

essentials of probability theory for statisticians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 25, 2020 by xi'an

On yet another confined sunny lazy Sunday morning, I read through Proschan and Shaw’s Essentials of Probability Theory for Statisticians, a CRC Press book that was sent to me quite a while ago for review. The book was indeed published in 2016. Before moving to serious things, let me evacuate the customary issue with the cover. I have trouble getting the point of the “face on Mars” being adopted as the cover of a book on probability theory (rather than a book on, say, pareidolia). There is a brief paragraph on post-facto probability calculations, stating how meaningless the question of the probability of this shade appearing on a Viking Orbiter picture by “chance”, but this is so marginal I would have preferred any other figure from the book!

The book plans to cover the probability essentials for dealing with graduate level statistics and in particular convergence, conditioning, and paradoxes following from using non-rigorous approaches to probability. A range that completely fits my own prerequisite for statistics students in my classes and that of course involves the recourse to (Lebesgue) measure theory. And a goal that I find both commendable and comforting as my past experience with exchange students led me to the feeling that rigorous probability theory was mostly scrapped from graduate programs. While the book is not extremely formal, it provides a proper motivation for the essential need of measure theory to handle the complexities of statistical analysis and in particular of asymptotics. It thus relies as much as possible on examples that stem from or relate to statistics, even though most examples may appear as standard to senior readers. For instance the consistency of the sample median or a weak version of the Glivenko-Cantelli theorem. The final chapter is dedicated to applications (in the probabilist’ sense!) that emerged from statistical problems. I felt these final chapters were somewhat stretched compared with what they could have been, as for instance with the multiple motivations of the conditional expectation, but this simply makes for more material. If I had to teach this material to students, I would certainly rely on the book! in particular because of the repeated appearances of the quincunx for motivating non-Normal limites. (A typo near Fatou’s lemma missed the dominating measure. And I did not notice the Riemann notation dx being extended to the measure in a formal manner.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

mining gold [ABC in PNAS]

Posted in Books, Statistics with tags , , , , , , , , , , , on March 13, 2020 by xi'an

Johann Brehmer and co-authors have just published a paper in PNAS entitled “Mining gold from implicit models to improve likelihood-free inference”. (Besides the pun about mining gold, the paper also involves techniques named RASCAL and SCANDAL, respectively! For Ratio And SCore Approximate Likelihood ratio and SCore-Augmented Neural Density Approximates Likelihood.) This setup is not ABC per se in that their simulator is used both to generate training data and construct a tractable surrogate model. Exploiting Geyer’s (1994) classification trick of expressing the likelihood ratio as the optimal classification ratio when facing two equal-size samples from one density and the other.

“For all these inference strategies, the augmented data is particularly powerful for enhancing the power of simulation-based inference for small changes in the parameter θ.”

Brehmer et al. argue that “the most important novel contribution that differentiates our work from the existing methods is the observation that additional information can be extracted from the simulator, and the development of loss functions that allow us to use this “augmented” data to more efficiently learn surrogates for the likelihood function.” Rather than starting from a statistical model, they also seem to use a scientific simulator made of multiple layers of latent variables z, where

x=F⁰(u⁰,z¹,θ), z¹=G¹(u¹,z²), z²=G¹(u²,z³), …

although they also call the marginal of x, p(x|θ), an (intractable) likelihood.

“The integral of the log is not the log of the integral!”

The central notion behind the improvement is a form of Rao-Blackwellisation, exploiting the simulated z‘s. Joint score functions and joint likelihood ratios are then available. Ignoring biases, the authors demonstrate that the closest approximation to the joint likelihood ratio and the joint score function that only depends on x is the actual likelihood ratio and the actual score function, respectively. Which sounds like an older EM result, except that the roles of estimate and target quantity are somehow inverted: one is approximating the marginal with the joint, while the marginal is the “best” approximation of the joint. But in the implementation of the method, an estimate of the (observed and intractable) likelihood ratio is indeed produced towards minimising an empirical loss based on two simulated samples. Learning this estimate ê(x) then allows one to use it for the actual data. It however requires fitting a new ê(x) for each pair of parameters. Providing as well an estimator of the likelihood p(x|θ). (Hence the SCANDAL!!!) A second type of approximation of the likelihood starts from the approximate value of the likelihood p(x|θ⁰) at a fixed value θ⁰ and expands it locally as an exponential family shift, with the score t(x|θ⁰) as sufficient statistic.

I find the paper definitely interesting even though it requires the representation of the (true) likelihood as a marginalisation over multiple layers of latent variables z. And does not provide an evaluation of the error involved in the process when the model is misspecified. As a minor supplementary appeal of the paper, the use of an asymmetric Galton quincunx to illustrate an intractable array of latent variables will certainly induce me to exploit it in projects and courses!

[Disclaimer: I was not involved in the PNAS editorial process at any point!]

Galton’s board all askew

Posted in Books, Kids, R with tags , , , , , , , on November 19, 2019 by xi'an

Since Galton’s quincunx has fascinated me since the (early) days when I saw a model of it as a teenager in an industry museum near Birmingham, I jumped on the challenge to build an uneven nail version where the probabilities to end up in one of the boxes were not the Binomial ones. For instance,  producing a uniform distribution with the maximum number of nails with probability ½ to turn right. And I obviously chose to try simulated annealing to figure out the probabilities, facing as usual the unpleasant task of setting the objective function, calibrating the moves and the temperature schedule. Plus, less usually, a choice of the space where the optimisation takes place, i.e., deciding on a common denominator for the (rational) probabilities. Should it be 2⁸?! Or more (since the solution with two levels also involves 1/3)? Using the functions

  for (i in 2:7){
    for (j in 2:i)


  while (sum(abs(8*evol(R.01){
    for (i in 2:7)
    if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
    for (i in 2:7) R[i,1:i]=(P[i,1:i]+P[i,i:1])/2
    if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
    if (runif(1)<1e-4) temp=temp+log(T)/T}

I first tried running my simulated annealing code with a target function like


where P is the 7×7 lower triangular matrix of nail probabilities, all with a 2⁸ denominator, reaching

126 35
107 81 20
104 71 22 0
126 44 26 69 14
61 123 113 92 91 38
109 60 7 19 44 74 50

for 128P. With  four entries close to 64, i.e. ½’s. Reducing the denominator to 16 produced once

12 1
13 11 3
16  7  6   2
14 13 16 15 0
15  15  2  7   7  4
    8   0    8   9   8  16  8

as 16P, with five ½’s (8). But none of the solutions had exactly a uniform probability of 1/8 to reach all endpoints. Success (with exact 1/8’s and a denominator of 4) was met with the new target


imposing precisely 1/8 on the final line. With a solution with 11 ½’s

1.0 0.0
1.0 0.0 0.0
1.0 0.5 1.0 0.5
0.5 0.5 1.0 0.0 0.0
1.0 0.0 0.5 0.0 0.5 0.0
0.5 0.5 0.5 1.0 1.0 1.0 0.5

and another one with 12 ½’s:

1.0 0.0
1.0 .375 0.0
1.0 1.0 .625 0.5
0.5  0.5  0.5  0.5  0.0
1.0  0.0  0.5  0.5  0.0  0.5
0.5  1.0  0.5  0.0  1.0  0.5  0.0

Incidentally, Michael Proschan and my good friend Jeff Rosenthal have an 2009 American Statistician paper on another modification of the quincunx they call the uncunx! Playing a wee bit further with the annealing, and using a denominator of 840 let to a 60P  with 13 ½’s out of 28

60 0
60 1 0
30 30 30 0
30 30 30 30 30
60  60  60  0  60  0
60  30  0  30  30 60 30

a quincunx on NBC

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , , on December 3, 2017 by xi'an

Through Five-Thirty-Eight, I became aware of a TV game call The Wall [so appropriate for Trumpian times!] that is essentially based on Galton’s quincunx! A huge [15m!] high version of Galton’s quincunx, with seven possible starting positions instead of one, which kills the whole point of the apparatus which is to demonstrate by simulation the proximity of the Binomial distribution to the limiting Normal (density) curve.

But the TV game has obvious no interest in the CLT, or in the Beta binomial posterior, only in a visible sequence of binary events that turn out increasing or decreasing the money “earned” by the player, the highest sums being unsurprisingly less likely. The only decision made by the player is to pick one of the seven starting points (meaning the outcome should behave like a weighted sum of seven Normals with drifted means depending on the probabilities of choosing these starting points). I found one blog entry analysing an “idiot” strategy of playing the game, but not the entire game. (Except for this entry on the older Plinko.) And Five-Thirty-Eight surprisingly does not get into the optimal strategies to play this game (maybe because there is none!). Five-Thirty-Eight also reproduces the apocryphal quote of Laplace not requiring this [God] hypothesis.

[Note: When looking for a picture of the Quincunx, I also found this desktop version! Which “allows you to visualize the order embedded in the chaos of randomness”, nothing less. And has even obtain a patent for this “visual aid that demonstrates [sic] a random walk and generates [re-sic] a bell curve distribution”…]

The Seven Pillars of Statistical Wisdom [book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on June 10, 2017 by xi'an

I remember quite well attending the ASA Presidential address of Stephen Stigler at JSM 2014, Boston, on the seven pillars of statistical wisdom. In connection with T.E. Lawrence’s 1926 book. Itself in connection with Proverbs IX:1. Unfortunately wrongly translated as seven pillars rather than seven sages.

As pointed out in the Acknowledgements section, the book came prior to the address by several years. I found it immensely enjoyable, first for putting the field in a (historical and) coherent perspective through those seven pillars, second for exposing new facts and curios about the history of statistics, third because of a literary style one would wish to see more often in scholarly texts and of a most pleasant design (and the list of reasons could go on for quite a while, one being the several references to Jorge Luis Borges!). But the main reason is to highlight the unified nature of Statistics and the reasons why it does not constitute a subfield of either Mathematics or Computer Science. In these days where centrifugal forces threaten to split the field into seven or more disciplines, the message is welcome and urgent.

Here are Stephen’s pillars (some comments being already there in the post I wrote after the address):

  1. aggregation, which leads to gain information by throwing away information, aka the sufficiency principle. One (of several) remarkable story in this section is the attempt by Francis Galton, never lacking in imagination, to visualise the average man or woman by superimposing the pictures of several people of a given group. In 1870!
  2. information accumulating at the √n rate, aka precision of statistical estimates, aka CLT confidence [quoting  de Moivre at the core of this discovery]. Another nice story is Newton’s wardenship of the English Mint, with musing about [his] potential exploiting this concentration to cheat the Mint and remain undetected!
  3. likelihood as the right calibration of the amount of information brought by a dataset [including Bayes’ essay as an answer to Hume and Laplace’s tests] and by Fisher in possible the most impressive single-handed advance in our field;
  4. intercomparison [i.e. scaling procedures from variability within the data, sample variation], from Student’s [a.k.a., Gosset‘s] t-test, better understood and advertised by Fisher than by the author, and eventually leading to the bootstrap;
  5. regression [linked with Darwin’s evolution of species, albeit paradoxically, as Darwin claimed to have faith in nothing but the irrelevant Rule of Three, a challenging consequence of this theory being an unobserved increase in trait variability across generations] exposed by Darwin’s cousin Galton [with a detailed and exhilarating entry on the quincunx!] as conditional expectation, hence as a true Bayesian tool, the Bayesian approach being more specifically addressed in (on?) this pillar;
  6. design of experiments [re-enters Fisher, with his revolutionary vision of changing all factors in Latin square designs], with an fascinating insert on the 18th Century French Loterie,  which by 1811, i.e., during the Napoleonic wars, provided 4% of the national budget!;
  7. residuals which again relate to Darwin, Laplace, but also Yule’s first multiple regression (in 1899), Fisher’s introduction of parametric models, and Pearson’s χ² test. Plus Nightingale’s diagrams that never cease to impress me.

The conclusion of the book revisits the seven pillars to ascertain the nature and potential need for an eight pillar.  It is somewhat pessimistic, at least my reading of it was, as it cannot (and presumably does not want to) produce any direction about this new pillar and hence about the capacity of the field of statistics to handle in-coming challenges and competition. With some amount of exaggeration (!) I do hope the analogy of the seven pillars that raises in me the image of the beautiful ruins of a Greek temple atop a Sicilian hill, in the setting sun, with little known about its original purpose, remains a mere analogy and does not extend to predict the future of the field! By its very nature, this wonderful book is about foundations of Statistics and therefore much more set in the past and on past advances than on the present, but those foundations need to move, grow, and be nurtured if the field is not to become a field of ruins, a methodology of the past!