## The [errors in the] error of truth [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on August 10, 2021 by xi'an

OUP sent me this book, The error of truth by Steven Osterling, for review. It is a story about the “astonishing” development of quantitative thinking in the past two centuries. Unfortunately, I found it to be one of the worst books I have read on the history of sciences…

To start with the rather obvious part, I find the scholarship behind the book quite shoddy as the author continuously brings in items of historical tidbits to support his overall narrative and sometimes fills gaps on his own. It often feels like the material comes from Wikipedia, despite expressing a critical view of the on-line encyclopedia. The [long] quote below is presumably the most shocking historical blunder, as the terror era marks the climax of the French Revolution, rather than the last fight of the French monarchy. Robespierre was the head of the Jacobins, the most radical revolutionaries at the time, and one of the Assembly members who voted for the execution of Louis XIV, which took place before the Terror. And later started to eliminate his political opponents, until he found himself on the guillotine!

“The monarchy fought back with almost unimaginable savagery. They ordered French troops to carry out a bloody campaign in which many thousands of protesters were killed. Any peasant even remotely suspected of not supporting the government was brutally killed by the soldiers; many were shot at point-blank range. The crackdown’s most intense period was a horrific ten-month Reign of Terror (“la Terreur”) during which the government guillotined untold masses (some estimates are as high as 5,000) of its own citizens as a means to control them. One of the architects of the Reign of Terror was Maximilien Robespierre, a French nobleman and lifelong politician. He explained the government’s slaughter in unbelievable terms, as “justified terror . . . [and] an emanation of virtue” (quoted in Linton 2006). Slowly, however, over the next few years, the people gained control. In the end, many nobles, including King Louis XVI and his wife Marie-Antoinette, were themselves executed by guillotining”

Obviously, this absolute misinterpretation does not matter (very) much for the (hi)story of quantification (and uncertainty assessment), but it demonstrates a lack of expertise of the author. And sap whatever trust one could have in new details he brings to light (life?). As for instance when stating

“Bayes did a lot of his developmental work while tutoring students in local pubs. He was a respected teacher. Taking advantage of his immediate resources (in his circumstance, a billiard table), he taught his theorem to many.”

which does not sound very plausible. I never heard that Bayes had students  or went to pubs or exposed his result to many before its posthumous publication… Or when Voltaire (who died in 1778) is considered as seventeenth-century precursor of the Enlightenment. Or when John Graunt, true member of the Royal Society, is given as a member of the Académie des Sciences. Or when Quetelet is presented as French and as a student of Laplace.

The maths explanations are also puzzling, from the law of large numbers illustrated by six observations, and wrongly expressed (p.54) as

$\bar{X}_n+\mu\qquad\text{when}\qquad n\longrightarrow\infty$

to  the Saint-Petersbourg paradox being seen as inverse probability, to a botched description of the central limit theorem  (p.59), including the meaningless equation (p.60)

$\gamma_n=\frac{2^{2n}}{\pi}\int_0^\pi~\cos^{2n} t\,\text dt$

to de Moivre‘s theorem being given as Taylor’s expansion

$f(z)=\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(z-a)^2$

and as his derivation of the concept of variance, to another botched depiction of the difference between Bayesian and frequentist statistics, incl. the usual horror

$P(68.5<70<71.5)=95%$

to independence being presented as a non-linear relation (p.111), to the conspicuous absence of Pythagoras in the regression chapter, to attributing to Gauss the concept of a probability density (when Simpson, Bayes, Laplace used it as well), to another highly confusing verbal explanation of densities, including a potential confusion between different representations of a distribution (Fig. 9.6) and the existence of distributions other than the Gaussian distribution, to another error in writing the Gaussian pdf (p.157),

$f(x)=\dfrac{e^{-(z-\mu)^2}\big/2\sigma^2}{\sigma\sqrt{2\pi}}$

to yet another error in the item response probability (p.301), and.. to completely missing the distinction between the map and the territory, i.e., the probabilistic model and the real world (“Truth”), which may be the most important shortcoming of the book.

The style is somewhat heavy, with many repetitions about the greatness of the characters involved in the story, and some degree of license in bringing them within the narrative of the book. The historical determinism of this narrative is indeed strong, with a tendency to link characters more than they were, and to make them greater than life. Which is a usual drawback of such books, along with the profuse apologies for presenting a few mathematical formulas!

The overall presentation further has a Victorian and conservative flavour in its adoration of great names, an almost exclusive centering on Western Europe, a patriarchal tone (“It was common for them to assist their husbands in some way or another”, p.44; Marie Curie “agreed to the marriage, believing it would help her keep her laboratory position”, p.283), a defense of the empowerment allowed by the Industrial Revolution and of the positive sides of colonialism and of the Western expansion of the USA, including the invention of Coca Cola as a landmark in the march to Progress!, to the fall of the (communist) Eastern Block being attributed to Ronald Reagan, Karol Wojtyła, and Margaret Thatcher, to the Bell Curve being written by respected professors with solid scholarship, if controversial, to missing the Ottoman Enlightenment and being particularly disparaging about the Middle East, to dismissing Galton’s eugenism as a later year misguided enthusiasm (and side-stepping the issue of Pearson’s and Fisher’s eugenic views),

Another recurrent if minor problem is the poor recording of dates and years when introducing an event or a new character. And the quotes referring to the current edition or translation instead of the original year as, e.g., Bernoulli (1954). Or even better!, Bayes and Price (1963).

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## quick(er) calculations [book review]

Posted in Statistics with tags , , , , , , , , , , , , on July 5, 2021 by xi'an

Upon my request, Oxford University Press sent me this book for review in CHANCE. With the extended title How to add, subtract, multiply, divide, square, and square root more swiftly. This short (173 pages) book is written by Trevor Davis Lipscombe, currently Director of the Catholic University of America Press (which are apparently not suited for his books, since his former Physics of Rugby got published by Nottingham University Press). The concept of the book is to list tricks and shortcuts to handle seemingly tough operations on a list of numbers. Illustrated by short anecdotes mostly related to religion, sports (including the Vatican cricket team!), and history, albeit not necessarily related with the computation at hand and not providing an in-depth coverage of calculation across the ages and the cultures. While the topic is rather dry, as illustrated by the section titles, e.g., “Multiply two numbers that differ by 2, 4, 6, or 20” or “Multiply or divide by 66 or 67, 666 or 667” (!), the exposition is somewhat facilitated by the (classics) culture of the author. (I have to confess I got lost by the date chapter, i.e., finding which day of the week was December 18, 1981, for instance. Especially by the concept of Doomsday which I thought was a special day of the year in the UK. Or in the USA.) Still, while recognising some simple decompositions I also used for additions and subtractions, and acknowledging the validity of the many tricks I had never though of, I wonder at the relevance of learning those dozens of approaches beyond maintaining a particular type of mental agility… Or preparing for party show-time. Especially for the operations that do not enjoy exact solutions, like dividing by √3 or multiplying by π… The book reminded me of a physics professor in Caen, Henri Eyraud, who used to approximate powers and roots faster than it took us to get a slide rule out of our bags! But Guesstimation, which I reviewed several years ago, seemed more far-reaching that Quick(er) calculations, in that I had tried to teach my kids (with limited success) how to reach the right order of magnitude of a quantity, but never insisted [beyond primary school] on quick mental calculations. (The Interlude V chapter connects with this idea.)

[Disclaimer about potential self-plagiarism: this post or an edited version should eventually appear in my Books Review section in CHANCE.]

## 9 pitfalls of data science [book review]

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 11, 2019 by xi'an

I received The 9 pitfalls of data science by Gary Smith [who has written a significant number of general public books on personal investment, statistics and AIs] and Jay Cordes from OUP for review a few weeks ago and read it on my trip to Salzburg. This short book contains a lot of anecdotes and what I would qualify of small talk on job experiences and colleagues’ idiosyncrasies…. More fundamentally, it reads as a sequence of examples of bad or misused statistics, as many general public books on statistics do, but with little to say on how to spot such misuses of statistics. Its title (It seems like the 9 pitfalls of… is a rather common début for a book title!) however started a (short) conversation with my neighbour on the train to Salzburg as she wanted to know if the job opportunities in data sciences were better in Germany than in Austria. A practically important question for which I had no clue. And I do not think the book would have helped either! (My neighbour in the earlier plane to München had a book on growing lotus, which was not particularly enticing for launching a conversation either.)

Chapter I “Using bad data” is made of examples of truncated or cherry picked data often associated with poor graphics. Only one dimensional outcome and also very US centric. Chapter II “Data before theory” highlights spurious correlations and post hoc predictions, criticism of data mining, some examples being quite standard. Chapter III “Worshiping maths” sounds like the perfect opposite of the previous cahpter: it discusses the fact that all models are wrong but some may be more wrong than others. And gives examples of over fitting, p-value hacking, regression applied to longitudinal data. With the message that (maths) assumptions are handy and helpful but not always realistic. Chapter IV “Worshiping computers” is about the new golden calf and contains rather standard stuff on trusting the computer output because it is a machine. However, the book is somewhat falling foul of the same mistake by trusting a Monte Carlo simulation of a shortfall probability for retirees since Monte Carlo also depends on a model! Computer simulations may be fine for Bingo night or poker tournaments but much more uncertain for complex decisions like retirement investments. It is also missing the biasing aspects in constructing recidivism prediction models pointed out in Weapons of math destruction. Until Chapter 9 at least. The chapter is also mentioning adversarial attacks if not GANs (!). Chapter V “Torturing data” mentions famous cheaters like Wansink of the bottomless bowl and pizza papers and contains more about p-hacking and reproducibility. Chapter VI “Fooling yourself” is a rather weak chapter in my opinion. Apart from Ioannidis take on Theranos’ lack of scientific backing, it spends quite a lot of space on stories about poker gains in the unregulated era of online poker, with boasts of significant gains that are possibly earned from compulsive gamblers playing their family savings, which is not particularly praiseworthy. And about Brazilian jiu-jitsu. Chapter VII “Correlation vs causation” predictably mentions Judea Pearl (whose book of why I just could not finish after reading one rant too many about statisticians being unable to get causality right! Especially after discussing the book with Andrew.). But not so much to gather from the chapter, which could have instead delved into deep learning and its ways to avoid overfitting. The first example of this chapter is more about confusing conditionals (what is conditional on what?) than turning causation around. Chapter VII “Regression to the mean” sees Galton’s quincunx reappearing here after Pearl’s book where I learned (and checked with Steve Stiegler) that the device was indeed intended for that purpose of illustrating regression to the mean. While the attractive fallacy is worth pointing out there are much worse abuses of regression that could be presented. CHANCE’s Howard Wainer also makes an appearance along SAT scores. Chapter IX “Doing harm” does engage into the issue that predicting social features like recidivism by a (black box) software is highly worrying (and just plain wrong) if only because of this black box nature. Moving predictably to chess and go with the right comment that this does not say much about real data problems. A word of warning about DNA testing containing very little about ancestry, if only because of the company limited and biased database. With further calls for data privacy and a rather useless entry on North Korea. Chapter X “The Great Recession“, which discusses the subprime scandal (as in Stewart’s book), contains a set of (mostly superfluous) equations from Samuelson’s paper (supposed to scare or impress the reader?!) leading to the rather obvious result that the expected concave utility of a weighted average of iid positive rvs is maximal when all the weights are equal, result that is criticised by laughing at the assumption of iid-ness in the case of mortgages. Along with those who bought exotic derivatives whose construction they could not understand. The (short) chapter keeps going through all the (a posteriori) obvious ingredients for a financial disaster to link them to most of the nine pitfalls. Except the second about data before theory, because there was no data, only theory with no connection with reality. This final chapter is rather enjoyable, if coming after the facts. And containing this altogether unnecessary mathematical entry. [Usual warning: this review or a revised version of it is likely to appear in CHANCE, in my book reviews column.]

## causality

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 7, 2016 by xi'an

Oxford University Press sent me this book by Phyllis Illari and Frederica Russo, Causality (Philosophical theory meets scientific practice) a little while ago. (The book appeared in 2014.) Unless I asked for it, I cannot remember…

“The problem is whether and how to use information of general causation established in science to ascertain individual responsibility.” (p.38)

As the subtitle indicates, this is a philosophy book, not a statistics book. And not particularly intended for statisticians. Hence, I am not exactly qualified to analyse its contents, and even less to criticise its lack of connection with statistics. But this being a blog post…  I read rather slowly through the book, which exposes a wide range (“a map”, p.8) of approaches and perspectives on the notions of causality, some ways to infer about causality, and the point of doing all this, concluding with a relativistic (and thus eminently philosophical) viewpoint defending a “pluralistic mosaic” or a “causal mosaic” that relates to all existing accounts of causality as they “each do something valuable” (p.258). From a naïve bystander perspective, this sounds like a new avatar of deconstructionism applied to causality.

“Simulations can be very illuminating about various phenomena that are complex and have unexpected effects (…) can be run repeatedly to study a system in different situations to those seen for the real system…” (p.15)

This is not to state that the book is uninteresting, as it provides a wide entry into philosophical attempts at categorising and defining causality, if not into the statistical aspects of the issue. (For instance, the problem whether or not causality can be proven uniquely from a statistical perspective is not mentioned.) Among those interesting points in the early chapters, a section (2.5) about simulation. Which however misses the depth of this earlier book on climate simulations I reviewed while in Monash. Or of the discussions at the interdisciplinary seminar last year in Hanover. I.J. Good’s probabilistic causality is mentioned but hardly detailed. (With the warning remark that one “should not confuse predictability with determinism [and] determinism with causality”, p.82.) Continue reading

## Medical illuminations [book review]

Posted in Books, pictures, Statistics with tags , , , , on September 27, 2013 by xi'an

Howard Wainer wrote another book, about to be published by Oxford University Press, called Medical Illuminations. (The book is announced for January 2 on amazon. A great New Year gift to be sure!) While I attended WSC 2013 in Hong Kong and then again at the RSS Annual Conference in Newcastle, I saw a preliminary copy of the book and asked the representative of OUP if I could get a copy for CHANCE (by any chance?!)… And they kindly sent me a copy the next day!

“This is an odd book (…) gallop[ing] off in all directions at once.” (p.152)

As can be seen from the cover, which reproduces the great da Vinci’s notebook page above (and seen also from the title where illuminations flirts with illuminated [manuscript]), the book focus on visualisation of medical data to “improve healthcare”. Its other themes are using evidence and statistical thinking towards the same goal. Since I was most impressed by the graphical part, I first thought of entitling the post as “Howard does his Tufte (before wondering at the appropriateness of such a title)!

“As hard as this may be to believe, this display is not notably worse than many of the others containd in this remarkable volume.” (p.78)

In fact, this first section is very much related with CHANCE in that a large sequence of graphics were submitted by CHANCE readers when Howard Wainer launched a competition in the magazine for improving upon a Nightingale-like representation by Burtin of antibiotics efficiency. It starts from a administrative ruling that the New York State Health Department had to publish cancer maps overlayed with potentially hazardous sites without any (interpretation) buffer. From there, Wainer shows how the best as well as the worst can be made of graphical representations of statistical data. It reproduces (with due mention) Tufte‘s selection of Minard‘s rendering of the Napoleonic Russian campaign as the best graph ever… The corresponding chapters of the book keep their focus on medical data, with some commentaries on the graphical quality of the 2008 National Healthcare Quality Report (ans.: could do better!). While this is well-done and with a significant message, I would still favour Tufte for teaching data users to present their findings in the most effective way. An interesting final chapter for the section is about “controlling creativity” where Howard Wainer follows in the steps of John Tukey about the Atlas of United States Mortality, And then shows a perfectly incomprehensible chart taken from Understanding USA, a not very premonitory title… (Besides Howard’s conclusion quoted above, you should also read the one-star comments on amazon!)

“Of course, it is impossible to underestimate the graphical skills of the mass media.” (p.164)

Section II is about a better use of statistics and of communicating those statistics towards improving healthcare, from fighting diabetes, to picking the right treatment for hip fractures (from an X-ray),  to re-evaluate detection tests (for breast and prostate cancers) as possibly very inefficient, and to briefly wonder about accelerated testing. And Section III tries to explain why progress (by applying the previous recommendations) has not been more steady. It starts with a story about the use of check-lists in intensive care and the dramatic impact on their effectiveness against infections. (The story hit home as I lost my thumb due to an infection while in intensive care! Maybe a check-list would have helped. Maybe.)  The next chapter contrasts the lack of progress in using check-lists with the adoption of the Korean alphabet in Korea, a wee unrelated example given the focus of the book. (Overall, I find most of the final chapters on the weak side of the book.)

This is indeed an odd book, with a lot of clever remarks and useful insights, but not so much with a driving line that would have made Wainer’s Medical Illuminations more than the sum of its components. Each section and most chapters (!) contain sensible recommendations for improving the presentation and exploitation of medical data towards practitioners and patients. I however wonder how much the book can impact the current state of affairs, like producing better tools for monitoring one’s own diabetes. So, in the end, I recommend the reading of Medical Illuminations as a very pleasant moment, from which examples and anecdotes can be borrowed for courses and friendly discussions. For non-statisticians, it is certainly a worthy entry on the relevance of statistical processing of (raw) data.