Archive for quincunx

probabilistic programming au collège [de France]

Posted in Statistics with tags , , , , , , on June 24, 2022 by xi'an

What are the chances of that?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on May 13, 2022 by xi'an

What are the chances that I review a book with this title, a few months after reviewing a book called What is luck?! This one is written by Andrew Elliott, whose Is that a big number? I reviewed a wee bit earlier… And that the cover of this book involves a particularly unlucky sequence of die as in my much earlier review of Krysz Burdzy’s book? (About 10⁻⁶ less likely than the likeliest draw!)

The (relative) specificity of this book is to try to convey the notions of chance and uncertainty to the general public, more in demonstrating that our intuition is most often wrong by examples and simulations, than in delving into psychological reasons as in Barbara Blatchley’s book. The author advances five dualities that underly our (dysfunctional) relation to chance: individual vs. collective, randomness vs. meaning, foresight vs. insight, uniformity vs. variability, and disruption vs. opportunity.

“News programmes clearly understand that the testimonies of individuals draw better audiences than the summaries of statisticians.” (p. xvii)

Some of the nice features of the book  are (a) the description of a probabilistic problem at the beginning of each chapter, to be solved at the end, (b) the use of simulation experiments, represented by coloured pixels over a grey band crossing the page, including a section on pseudorandom generators [which is less confusing that the quote below may indicate!], (c) taking full advantage of the quincunx apparatus, and (d) very few apologies for getting into formulas. And even a relevant quote of Taleb’s Black Swan about the ludic fallacy. On the other hand, the author spends quite a large component of the book on chance games, exhibiting a ludic tendency! And contemplates biased coins, while he should know better! The historical sections may prove too much for both informed and uninformed readers. (However, I learned that the UK Government had used a form of lottery to pay interests on premium bonds.) And the later parts are less numerical and quantified, even though the author brings in the micromort measurement [invented by Ronald Howard and] favoured by David Spiegelhalter. Who actually appears to have inspired several other sections, like the one on coincidences (which remains quite light in its investigation!). I finished the book rather quickly by browsing though mostly anecdotes and a lesser feel of a unified discourse. I did not find the attempt to link with the COVID pandemic, which definitely resets our clocks on risk, particularly alluring…

“People go to a lot of trouble to generate truly random numbers—sequences that are impossible to predict.” (p.66)

The apparition of the Normal distribution is somewhat overdone and almost mystical, if the tone gets more reasonable by the end of the corresponding chapter.

“…combining random numbers from distributions that really have no business being added together (…) ends up with a statistic that actually fits the normal distribution quite well.” (p.83)

The part about Bayes and Bayesian reasoning does not include any inference, with a rather duh! criticism of prior modelling.

“If you are tempted to apply a group statistic derived from a broad analysis to a more narrow purpose, you run the risk of making an unfair judgement.” (p.263)

The section about Xenakis’ musical creations as a Markov process was most interesting (and novel to me). I also enjoyed the shared cultural entries, esp. literary ones. Like citing the recent Chernobyl TV drama. Or Philip K. Dick’s Do Androids Dream of Electric Sheep? Or yet Monty Python’s Life of Brian. Overall, there is enough trivia and engagement to keep reading the book till its end!

essentials of probability theory for statisticians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 25, 2020 by xi'an

On yet another confined sunny lazy Sunday morning, I read through Proschan and Shaw’s Essentials of Probability Theory for Statisticians, a CRC Press book that was sent to me quite a while ago for review. The book was indeed published in 2016. Before moving to serious things, let me evacuate the customary issue with the cover. I have trouble getting the point of the “face on Mars” being adopted as the cover of a book on probability theory (rather than a book on, say, pareidolia). There is a brief paragraph on post-facto probability calculations, stating how meaningless the question of the probability of this shade appearing on a Viking Orbiter picture by “chance”, but this is so marginal I would have preferred any other figure from the book!

The book plans to cover the probability essentials for dealing with graduate level statistics and in particular convergence, conditioning, and paradoxes following from using non-rigorous approaches to probability. A range that completely fits my own prerequisite for statistics students in my classes and that of course involves the recourse to (Lebesgue) measure theory. And a goal that I find both commendable and comforting as my past experience with exchange students led me to the feeling that rigorous probability theory was mostly scrapped from graduate programs. While the book is not extremely formal, it provides a proper motivation for the essential need of measure theory to handle the complexities of statistical analysis and in particular of asymptotics. It thus relies as much as possible on examples that stem from or relate to statistics, even though most examples may appear as standard to senior readers. For instance the consistency of the sample median or a weak version of the Glivenko-Cantelli theorem. The final chapter is dedicated to applications (in the probabilist’ sense!) that emerged from statistical problems. I felt these final chapters were somewhat stretched compared with what they could have been, as for instance with the multiple motivations of the conditional expectation, but this simply makes for more material. If I had to teach this material to students, I would certainly rely on the book! in particular because of the repeated appearances of the quincunx for motivating non-Normal limites. (A typo near Fatou’s lemma missed the dominating measure. And I did not notice the Riemann notation dx being extended to the measure in a formal manner.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

mining gold [ABC in PNAS]

Posted in Books, Statistics with tags , , , , , , , , , , , on March 13, 2020 by xi'an

Johann Brehmer and co-authors have just published a paper in PNAS entitled “Mining gold from implicit models to improve likelihood-free inference”. (Besides the pun about mining gold, the paper also involves techniques named RASCAL and SCANDAL, respectively! For Ratio And SCore Approximate Likelihood ratio and SCore-Augmented Neural Density Approximates Likelihood.) This setup is not ABC per se in that their simulator is used both to generate training data and construct a tractable surrogate model. Exploiting Geyer’s (1994) classification trick of expressing the likelihood ratio as the optimal classification ratio when facing two equal-size samples from one density and the other.

“For all these inference strategies, the augmented data is particularly powerful for enhancing the power of simulation-based inference for small changes in the parameter θ.”

Brehmer et al. argue that “the most important novel contribution that differentiates our work from the existing methods is the observation that additional information can be extracted from the simulator, and the development of loss functions that allow us to use this “augmented” data to more efficiently learn surrogates for the likelihood function.” Rather than starting from a statistical model, they also seem to use a scientific simulator made of multiple layers of latent variables z, where

x=F⁰(u⁰,z¹,θ), z¹=G¹(u¹,z²), z²=G¹(u²,z³), …

although they also call the marginal of x, p(x|θ), an (intractable) likelihood.

“The integral of the log is not the log of the integral!”

The central notion behind the improvement is a form of Rao-Blackwellisation, exploiting the simulated z‘s. Joint score functions and joint likelihood ratios are then available. Ignoring biases, the authors demonstrate that the closest approximation to the joint likelihood ratio and the joint score function that only depends on x is the actual likelihood ratio and the actual score function, respectively. Which sounds like an older EM result, except that the roles of estimate and target quantity are somehow inverted: one is approximating the marginal with the joint, while the marginal is the “best” approximation of the joint. But in the implementation of the method, an estimate of the (observed and intractable) likelihood ratio is indeed produced towards minimising an empirical loss based on two simulated samples. Learning this estimate ê(x) then allows one to use it for the actual data. It however requires fitting a new ê(x) for each pair of parameters. Providing as well an estimator of the likelihood p(x|θ). (Hence the SCANDAL!!!) A second type of approximation of the likelihood starts from the approximate value of the likelihood p(x|θ⁰) at a fixed value θ⁰ and expands it locally as an exponential family shift, with the score t(x|θ⁰) as sufficient statistic.

I find the paper definitely interesting even though it requires the representation of the (true) likelihood as a marginalisation over multiple layers of latent variables z. And does not provide an evaluation of the error involved in the process when the model is misspecified. As a minor supplementary appeal of the paper, the use of an asymmetric Galton quincunx to illustrate an intractable array of latent variables will certainly induce me to exploit it in projects and courses!

[Disclaimer: I was not involved in the PNAS editorial process at any point!]

Galton’s board all askew

Posted in Books, Kids, R with tags , , , , , , , on November 19, 2019 by xi'an

Since Galton’s quincunx has fascinated me since the (early) days when I saw a model of it as a teenager in an industry museum near Birmingham, I jumped on the challenge to build an uneven nail version where the probabilities to end up in one of the boxes were not the Binomial ones. For instance,  producing a uniform distribution with the maximum number of nails with probability ½ to turn right. And I obviously chose to try simulated annealing to figure out the probabilities, facing as usual the unpleasant task of setting the objective function, calibrating the moves and the temperature schedule. Plus, less usually, a choice of the space where the optimisation takes place, i.e., deciding on a common denominator for the (rational) probabilities. Should it be 2⁸?! Or more (since the solution with two levels also involves 1/3)? Using the functions

evol<-function(P){
  Q=matrix(0,7,8)
  Q[1,1]=P[1,1];Q[1,2]=1-P[1,1]
  for (i in 2:7){
    Q[i,1]=Q[i-1,1]*P[i,1]
    for (j in 2:i)
      Q[i,j]=Q[i-1,j-1]*(1-P[i,j-1])+Q[i-1,j]*P[i,j]
    Q[i,i+1]=Q[i-1,i]*(1-P[i,i])
    Q[i,]=Q[i,]/sum(Q[i,])}
  return(Q)}

and

temper<-function(T=1e3){
  bestar=tarP=targ(P<-matrix(1/2,7,7))
  temp=.01
  while (sum(abs(8*evol(R.01){
    for (i in 2:7)
      R[i,sample(rep(1:i,2),1)]=sample(0:deno,1)/deno
    if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
    for (i in 2:7) R[i,1:i]=(P[i,1:i]+P[i,i:1])/2
    if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
    if (runif(1)<1e-4) temp=temp+log(T)/T}
  return(P)}

I first tried running my simulated annealing code with a target function like

targ<-function(P)(1+.1*sum(!(2*P==1)))*sum(abs(8*evol(P)[7,]-1))

where P is the 7×7 lower triangular matrix of nail probabilities, all with a 2⁸ denominator, reaching

60
126 35
107 81 20
104 71 22 0
126 44 26 69 14
61 123 113 92 91 38
109 60 7 19 44 74 50

for 128P. With  four entries close to 64, i.e. ½’s. Reducing the denominator to 16 produced once

8
12 1
13 11 3
16  7  6   2
14 13 16 15 0
15  15  2  7   7  4
    8   0    8   9   8  16  8

as 16P, with five ½’s (8). But none of the solutions had exactly a uniform probability of 1/8 to reach all endpoints. Success (with exact 1/8’s and a denominator of 4) was met with the new target

(1+,1*sum(!(2*P==1)))*(.01+sum(!(8*evol(P)[7,]==1)))

imposing precisely 1/8 on the final line. With a solution with 11 ½’s

0.5
1.0 0.0
1.0 0.0 0.0
1.0 0.5 1.0 0.5
0.5 0.5 1.0 0.0 0.0
1.0 0.0 0.5 0.0 0.5 0.0
0.5 0.5 0.5 1.0 1.0 1.0 0.5

and another one with 12 ½’s:

0.5
1.0 0.0
1.0 .375 0.0
1.0 1.0 .625 0.5
0.5  0.5  0.5  0.5  0.0
1.0  0.0  0.5  0.5  0.0  0.5
0.5  1.0  0.5  0.0  1.0  0.5  0.0

Incidentally, Michael Proschan and my good friend Jeff Rosenthal have an 2009 American Statistician paper on another modification of the quincunx they call the uncunx! Playing a wee bit further with the annealing, and using a denominator of 840 let to a 60P  with 13 ½’s out of 28

30
60 0
60 1 0
30 30 30 0
30 30 30 30 30
60  60  60  0  60  0
60  30  0  30  30 60 30

%d bloggers like this: