## another book on J.B.S. Haldane [review of a book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on August 24, 2020 by xi'an

As I noticed a NYT book review of a most recent book on J.B.S. Haldane, I realised several other books had already been written about him. From an early 1985 biography, “Haldane: the life and work of J.B.S. Haldane with special references to India” followed by a “2016 biographyPopularizing Science” along an  2009 edited book on some Haldane’s essays, “What I require from life“, all by Krishna R. Dronamraju to a 1969 biography with the cryptic title “J.B.S.“, by Richard Clarke, along with a sensational 2018 “Comrade Haldane Is Too Busy to Go on Holiday: The Genius Who Spied for Stalin” by Gavan Tredoux, depicting him as a spy for the Soviet Union during WW II. (The last author is working on a biography of Francis Galton, hopefully exonerating him of spying for the French! But a short text of him comparing Haldane and Darlington appears to support the later’s belief in racial differences in intelligence…) I also discovered that J.B.S. had written a children book, “Mr Friend Mr. Leaky“, illustrated by Quentin Blake, Roald Dahl’s illustrator. (Charlotte Franken Haldane, J.B.S.’s first wife, also wrote a considerable number of books.)

The NYT review is more a summary of Haldane’s life than an analysis of the book itself, hard as it is not to get mesmerised by the larger-than-life stature of J.B.S. It does not dwell very long on the time it took Haldane to break from the Communist Party for its adherence to the pseudo-science Lysenko (while his wife Charlotte had realised the repressive nature of the Soviet regime much earlier, which may have led to their divorce). While the review makes no mention at all of Haldane’s ideological move to the ISI in Kolkata, it concludes with “for all his failings, he was “deeply attractive during a time of shifting, murky moralities.”” [The double quotes being the review quoting the book!]

## a conversation about eugenism at JSM

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , , , , on July 29, 2020 by xi'an

Following the recent debate on Fisher’s involvement in eugenics (and the renaming of the R.A. Fisher Award and Lectureship into the COPSS Distinguished Achievement Award and Lectureship), the ASA is running a JSM round table on Eugenics and its connections with statistics, to which I had been invited, along with Scarlett BellamyDavid Bellhouse, and David Cutler. The discussion is planned on 06 August at 3pm (ET, i.e., 7GMT) and here is the abstract:

The development of eugenics and modern statistical theory are inextricably entwined in history.  Their evolution was guided by the culture and societal values of scholars (and the ruling class) of their time through and including today.  Motivated by current-day societal reckonings of systemic injustice and inequity, this roundtable panel explores the role of prominent statisticians and of statistics more broadly in the development of eugenics at its inception and over the past century.  Leveraging a diverse panel, the discussions seek to shed light on how eugenics and statistics – despite their entangled past — have now severed, continue to have presence in ways that affect our lives and aspirations.

It is actually rather unclear to me why I was invited at the table, apart from my amateur interest in the history of statistics. On a highly personal level, I remember being introduced to Galton’s racial theories during my first course on probability, in 1982, by Prof Ogier, who always used historical anecdotes to enliven his lectures, like Galton trying to measure women mensurations during his South Africa expedition. Lectures that took place in the INSEE building, boulevard Adolphe Pinard in Paris, with said Adolphe Pinard being a founding member of the French Eugenics Society in 1913.

## Probability and Bayesian modeling [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 26, 2020 by xi'an

Probability and Bayesian modeling is a textbook by Jim Albert [whose reply is included at the end of this entry] and Jingchen Hu that CRC Press sent me for review in CHANCE. (The book is also freely available in bookdown format.) The level of the textbook is definitely most introductory as it dedicates its first half on probability concepts (with no measure theory involved), meaning mostly focusing on counting and finite sample space models. The second half moves to Bayesian inference(s) with a strong reliance on JAGS for the processing of more realistic models. And R vignettes for the simplest cases (where I discovered R commands I ignored, like dplyr::mutate()!).

As a preliminary warning about my biases, I am always reserved at mixing introductions to probability theory and to (Bayesian) statistics in the same book, as I feel they should be separated to avoid confusion. As for instance between histograms and densities, or between (theoretical) expectation and (empirical) mean. I therefore fail to relate to the pace and tone adopted in the book which, in my opinion, seems to dally on overly simple examples [far too often concerned with food or baseball] while skipping over the concepts and background theory. For instance, introducing the concept of subjective probability as early as page 6 is laudable but I doubt it will engage fresh readers when describing it as a measurement of one’s “belief about the truth of an event”, then stressing that “make any kind of measurement, one needs a tool like a scale or ruler”. Overall, I have no particularly focused criticisms on the probability part except for the discrete vs continuous imbalance. (With the Poisson distribution not covered in the Discrete Distributions chapter. And the “bell curve” making a weird and unrigorous appearance there.) Galton’s board (no mention found of quincunx) could have been better exploited towards the physical definition of a prior, following Steve Stiegler’s analysis, by adding a second level. Or turned into an R coding exercise. In the continuous distributions chapter, I would have seen the cdf coming first to the pdf, rather than the opposite. And disliked the notion that a Normal distribution was supported by an histogram of (marathon) running times, i.e. values lower bounded by 122 (at the moment). Or later (in Chapter 8) for Roger Federer’s serving times. Incidentally, a fun typo on p.191, at least fun for LaTeX users, as

$f_{Y\ mid X}$

with an extra space between \’ and mid’! (I also noticed several occurrences of the unvoidable “the the” typo in the last chapters.) The simulation from a bivariate Normal distribution hidden behind a customised R function sim_binom() when it could have been easily described as a two-stage hierarchy. And no comment on the fact that a sample from Y-1.5X could be directly derived from the joint sample. (Too unconscious a statistician?)

When moving to Bayesian inference, a large section is spent on very simple models like estimating a proportion or a mean, covering both discrete and continuous priors. And strongly focusing on conjugate priors despite giving warnings that they do not necessarily reflect prior information or prior belief. With some debatable recommendation for “large” prior variances as weakly informative or (worse) for Exp(1) as a reference prior for sample precision in the linear model (p.415). But also covering Bayesian model checking either via prior predictive (hence Bayes factors) or posterior predictive (with no mention of using the data twice). A very marginalia in introducing a sufficient statistic for the Normal model. In the Normal model checking section, an estimate of the posterior density of the mean is used without (apparent) explanation.

“It is interesting to note the strong negative correlation in these parameters. If one assigned informative independent priors on and , these prior beliefs would be counter to the correlation between the two parameters observed in the data.”

For the same reasons of having to cut on mathematical validation and rigour, Chapter 9 on MCMC is not explaining why MCMC algorithms are converging outside of the finite state space case. The proposal in the algorithmic representation is chosen as a Uniform one, since larger dimension problems are handled by either Gibbs or JAGS. The recommendations about running MCMC do not include how many iterations one “should” run (or other common queries on Stack eXchange), albeit they do include the sensible running multiple chains and comparing simulated predictive samples with the actual data as a  model check. However, the MCMC chapter very quickly and inevitably turns into commented JAGS code. Which I presume would require more from the students than just reading the available code. Like JAGS manual. Chapter 10 is mostly a series of examples of Bayesian hierarchical modeling, with illustrations of the shrinkage effect like the one on the book cover. Chapter 11 covers simple linear regression with some mentions of weakly informative priors,  although in a BUGS spirit of using large [enough?!] variances: “If one has little information about the location of a regression parameter, then the choice of the prior guess is not that important and one chooses a large value for the prior standard deviation . So the regression intercept and slope are each assigned a Normal prior with a mean of 0 and standard deviation equal to the large value of 100.” (p.415). Regardless of the scale of y? Standardisation is covered later in the chapter (with the use of the R function scale()) as part of constructing more informative priors, although this sounds more like data-dependent priors to me in the sense that the scale and location are summarily estimated by empirical means from the data. The above quote also strikes me as potentially confusing to the students, as it does not spell at all how to design a joint distribution on the linear regression coefficients that translate the concentration of these coefficients along y̅=β⁰+β¹x̄. Chapter 12 expands the setting to multiple regression and generalised linear models, mostly consisting of examples. It however suggests using cross-validation for model checking and then advocates DIC (deviance information criterion) as “to approximate a model’s out-of-sample predictive performance” (p.463). If only because it is covered in JAGS, the definition of the criterion being relegated to the last page of the book. Chapter 13 concludes with two case studies, the (often used) Federalist Papers analysis and a baseball career hierarchical model. Which may sound far-reaching considering the modest prerequisites the book started with.

In conclusion of this rambling [lazy Sunday] review, this is not a textbook I would have the opportunity to use in Paris-Dauphine but I can easily conceive its adoption for students with limited maths exposure. As such it offers a decent entry to the use of Bayesian modelling, supported by a specific software (JAGS), and rightly stresses the call to model checking and comparison with pseudo-observations. Provided the course is reinforced with a fair amount of computer labs and projects, the book can indeed achieve to properly introduce students to Bayesian thinking. Hopefully leading them to seek more advanced courses on the topic.

Update: Jim Albert sent me the following precisions after this review got on-line:

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE. As appropriate for a book about Chance!]

## limited shelf validity

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on December 11, 2019 by xi'an

A great article from Steve Stigler in the new, multi-scaled, and so exciting Harvard Data Science Review magisterially operated by Xiao-Li Meng, on the limitations of old datasets. Illustrated by three famous datasets used by three equally famous statisticians, Quetelet, Bortkiewicz, and Gosset. None of whom were fundamentally interested in the data for their own sake. First, Quetelet’s data was (wrongly) reconstructed and missed the opportunity to beat Galton at discovering correlation. Second, Bortkiewicz went looking (or even cherry-picking!) for these rare events in yearly tables of mortality minutely divided between causes such as military horse kicks. The third dataset is not Guinness‘, but a test between two sleeping pills, operated rather crudely over inmates from a psychiatric institution in Kalamazoo, with further mishandling by Gosset himself. Manipulations that turn the data into dead data, as Steve put it. (And illustrates with the above skull collection picture. As well as warning against attempts at resuscitating dead data into what could be called “zombie data”.)

“Successful resurrection is only slightly more common than in Christian theology.”

His global perspective on dead data is that they should stop being used before extending their (shelf) life, rather than turning into benchmarks recycled over and over as a proof of concept. If only (my two cents) because it leads to calibrate (and choose) methods doing well over these benchmarks. Another example that could have been added to the skulls above is the Galaxy Velocity Dataset that makes frequent appearances in works estimating Gaussian mixtures. Which Radford Neal signaled at the 2001 ICMS workshop on mixture estimation as an inappropriate use of the dataset since astrophysical arguments weighted against a mixture modelling.

“…the role of context in shaping data selection and form—context in temporal, political, and social as well as scientific terms—has been shown to be a powerful and interesting phenomenon.”

The potential for “dead-er” data (my neologism!) increases with the epoch in that the careful sleuth work Steve (and others) conducted about these historical datasets is absolutely impossible with the current massive data sets. Massive and proprietary. And presumably discarded once the associated neural net is designed and sold. Letting the burden of unmasking the potential (or highly probable?) biases to others. Most interestingly, this recoups a “comment” in Nature of 17 October by Sabina Leonelli on the transformation of data from a national treasure to a commodity which “ownership can confer and signal power”. But her call for openness and governance of research data seems as illusory as other attempts to sever the GAFAs from their extra-territorial privileges…

Posted in Books, Kids, R with tags , , , , , , , on November 19, 2019 by xi'an

Since Galton’s quincunx has fascinated me since the (early) days when I saw a model of it as a teenager in an industry museum near Birmingham, I jumped on the challenge to build an uneven nail version where the probabilities to end up in one of the boxes were not the Binomial ones. For instance,  producing a uniform distribution with the maximum number of nails with probability ½ to turn right. And I obviously chose to try simulated annealing to figure out the probabilities, facing as usual the unpleasant task of setting the objective function, calibrating the moves and the temperature schedule. Plus, less usually, a choice of the space where the optimisation takes place, i.e., deciding on a common denominator for the (rational) probabilities. Should it be 2⁸?! Or more (since the solution with two levels also involves 1/3)? Using the functions

evol<-function(P){
Q=matrix(0,7,8)
Q[1,1]=P[1,1];Q[1,2]=1-P[1,1]
for (i in 2:7){
Q[i,1]=Q[i-1,1]*P[i,1]
for (j in 2:i)
Q[i,j]=Q[i-1,j-1]*(1-P[i,j-1])+Q[i-1,j]*P[i,j]
Q[i,i+1]=Q[i-1,i]*(1-P[i,i])
Q[i,]=Q[i,]/sum(Q[i,])}
return(Q)}


and

temper<-function(T=1e3){
bestar=tarP=targ(P<-matrix(1/2,7,7))
temp=.01
while (sum(abs(8*evol(R.01){
for (i in 2:7)
R[i,sample(rep(1:i,2),1)]=sample(0:deno,1)/deno
if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
for (i in 2:7) R[i,1:i]=(P[i,1:i]+P[i,i:1])/2
if (log(runif(1))/temp<tarP-(tarR<-targ(R))){P=R;tarP=tarR}
if (runif(1)<1e-4) temp=temp+log(T)/T}
return(P)}


I first tried running my simulated annealing code with a target function like

targ<-function(P)(1+.1*sum(!(2*P==1)))*sum(abs(8*evol(P)[7,]-1))

where P is the 7×7 lower triangular matrix of nail probabilities, all with a 2⁸ denominator, reaching

60
126 35
107 81 20
104 71 22 0
126 44 26 69 14
61 123 113 92 91 38
109 60 7 19 44 74 50

for 128P. With  four entries close to 64, i.e. ½’s. Reducing the denominator to 16 produced once

8
12 1
13 11 3
16  7  6   2
14 13 16 15 0
15  15  2  7   7  4
8   0    8   9   8  16  8

as 16P, with five ½’s (8). But none of the solutions had exactly a uniform probability of 1/8 to reach all endpoints. Success (with exact 1/8’s and a denominator of 4) was met with the new target

(1+,1*sum(!(2*P==1)))*(.01+sum(!(8*evol(P)[7,]==1)))

imposing precisely 1/8 on the final line. With a solution with 11 ½’s

0.5
1.0 0.0
1.0 0.0 0.0
1.0 0.5 1.0 0.5
0.5 0.5 1.0 0.0 0.0
1.0 0.0 0.5 0.0 0.5 0.0
0.5 0.5 0.5 1.0 1.0 1.0 0.5

and another one with 12 ½’s:

0.5
1.0 0.0
1.0 .375 0.0
1.0 1.0 .625 0.5
0.5  0.5  0.5  0.5  0.0
1.0  0.0  0.5  0.5  0.0  0.5
0.5  1.0  0.5  0.0  1.0  0.5  0.0

Incidentally, Michael Proschan and my good friend Jeff Rosenthal have an 2009 American Statistician paper on another modification of the quincunx they call the uncunx! Playing a wee bit further with the annealing, and using a denominator of 840 let to a 60P  with 13 ½’s out of 28

30
60 0
60 1 0
30 30 30 0
30 30 30 30 30
60  60  60  0  60  0
60  30  0  30  30 60 30