Probably overthinking it, written by Allen B. Downey (who wrote a series of books starting with Think, like Think Python, Think Bayes, Think Stats), belongs to this numerous collection of introductory books that aim at making statistics more palatable and enticing to the general public by making the fundamental concepts more intuitive and building upon real life examples. I would thus stop short of calling it “essential guide” as in the first flap of the dust jacket, since there exist many published books with a similar goal, some of which were actually reviews here. Now, there are ideas and examples therein I could borrow for my introductory stats course, except that I will cease teaching it next year! For instance, there are lots of examples related to COVID, which is great to engage (enrage?) the readers.
The book is quite pleasant to read, does not shy from mathematical formulae, and covers notions such as probability distributions, the Simpson, the Preston, the inspection, the Berkson paradoxes, and even some words on causality, sometimes at excessive lengths. (I have always been an adept of the concise church when it comes to textbook examples and fear that the multiplication of illustrations of a given concept may prove counterproductive.) The early chapters are heavily focussed on the Gaussian (or Normal) distribution. Making it appear as essential for conducting statistical analysis. When it does not, as in the ELO example, the explanations of a correction are less convincing.
I appreciated the book approach to model fit via the comparison of empirical cdfs with hypothetical ones. Also of primary interest is the systematic recourse to simulation, aka generative models, albeit without a systematic proper description. In the chapter (Chap 5) about durations, I think there are missed opportunities like the distributions of extremes (p 82) or the forgetfulness property of the Exponential distribution. Instead the focus is slightly diverging towards non-statistical issues on demography by the end of the chapter, with a potential for confusion between the Gomperz law and the Gomperz distribution. The Berkson paradox (Chap 6) is well-explained in terms of non-random populations (and reminded me when, years ago, when we tried to predict the first year success probability of undergrad applicants from their high school maths grade, the regression coefficient estimate ended up negative). Distributions of extremes do appear in Chap 8, if again seeking an ideal generic distribution seems to me rather misguided and misguiding. I would also argue that the author is missing the point of Taleb’s black swans by arguing in favour of a better modelling, when the later argues against the very predictability of extreme events in a non-stationary financial world… The chapter on fairness and fallacy (Chap 9) is actually about false positive/negative rates in different populations hence the ensuing unfairness (or the base fallacy). In that chapter there is no mention of Bayes (reserved for Think Bayes?!), but it is hitting hard enough at anti-vaxers (who will most likely not read the book). And does it again in the Simpson paradox chapter (Chap 10), whose proliferation is further stressed the following chapter on people becoming less racist or sexist or homophobic when they age, despite the proportion of racist/sexist/homophobic responses to a specific survey (GSS/Pew) increasing with age. This is prolonged into the rather minor final chapter.
Now that I have read the book, during a balmy afternoon in St Kilda (after an early start in the train to De Gaulle airport in freezing temperatures), I am a bit uncertain at what to make of it in terms of impact on the general public. For sure, the stories that accumulate chapter after chapter are nice and well argued, while introducing useful statistical concepts, but I do not see readers equipped enough to handle daily statistics with more than an healthy dose of scepticism, which obviously is a first step in the right direction!
Some nitpicking : the book is missing the historical connection to Quetelet’s “average man” when referring to the notion. And a potential explanation for the (approximate) log-Gaussianity of weights of individuals in a population through the fact that it is a volume, hence a third power of a sort. Although birth weights are roughly Normal which kill my argument. I remain puzzled by the title, possibly missing a cultural reference (as there are tee-shirts sold with this sentence). It is the same as the name of a blog run by the author since 2011 and a fodder for the book. And the cover is terrible, breaking the words to fit the width making no sense, if I am not overthinking it! As often the book is rather US centric, although making no mention of US having much higher infant death rates than countries with similar GDPs when this data is discussed.
[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]