## What are the chances of that?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on May 13, 2022 by xi'an

What are the chances that I review a book with this title, a few months after reviewing a book called What is luck?! This one is written by Andrew Elliott, whose Is that a big number? I reviewed a wee bit earlier… And that the cover of this book involves a particularly unlucky sequence of die as in my much earlier review of Krysz Burdzy’s book? (About 10⁻⁶ less likely than the likeliest draw!)

The (relative) specificity of this book is to try to convey the notions of chance and uncertainty to the general public, more in demonstrating that our intuition is most often wrong by examples and simulations, than in delving into psychological reasons as in Barbara Blatchley’s book. The author advances five dualities that underly our (dysfunctional) relation to chance: individual vs. collective, randomness vs. meaning, foresight vs. insight, uniformity vs. variability, and disruption vs. opportunity.

“News programmes clearly understand that the testimonies of individuals draw better audiences than the summaries of statisticians.” (p. xvii)

Some of the nice features of the book  are (a) the description of a probabilistic problem at the beginning of each chapter, to be solved at the end, (b) the use of simulation experiments, represented by coloured pixels over a grey band crossing the page, including a section on pseudorandom generators [which is less confusing that the quote below may indicate!], (c) taking full advantage of the quincunx apparatus, and (d) very few apologies for getting into formulas. And even a relevant quote of Taleb’s Black Swan about the ludic fallacy. On the other hand, the author spends quite a large component of the book on chance games, exhibiting a ludic tendency! And contemplates biased coins, while he should know better! The historical sections may prove too much for both informed and uninformed readers. (However, I learned that the UK Government had used a form of lottery to pay interests on premium bonds.) And the later parts are less numerical and quantified, even though the author brings in the micromort measurement [invented by Ronald Howard and] favoured by David Spiegelhalter. Who actually appears to have inspired several other sections, like the one on coincidences (which remains quite light in its investigation!). I finished the book rather quickly by browsing though mostly anecdotes and a lesser feel of a unified discourse. I did not find the attempt to link with the COVID pandemic, which definitely resets our clocks on risk, particularly alluring…

“People go to a lot of trouble to generate truly random numbers—sequences that are impossible to predict.” (p.66)

The apparition of the Normal distribution is somewhat overdone and almost mystical, if the tone gets more reasonable by the end of the corresponding chapter.

“…combining random numbers from distributions that really have no business being added together (…) ends up with a statistic that actually fits the normal distribution quite well.” (p.83)

The part about Bayes and Bayesian reasoning does not include any inference, with a rather duh! criticism of prior modelling.

“If you are tempted to apply a group statistic derived from a broad analysis to a more narrow purpose, you run the risk of making an unfair judgement.” (p.263)

The section about Xenakis’ musical creations as a Markov process was most interesting (and novel to me). I also enjoyed the shared cultural entries, esp. literary ones. Like citing the recent Chernobyl TV drama. Or Philip K. Dick’s Do Androids Dream of Electric Sheep? Or yet Monty Python’s Life of Brian. Overall, there is enough trivia and engagement to keep reading the book till its end!

## learning base R [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , on February 26, 2022 by xi'an

This second edition of an introductory R book was sent to me by the author for a potential CHANCE book review.  As there are many (many) books in the same spirit, the main question behind my reading it (in one go) was on the novelty it brings. The topics Learning Base R covers are

• arithmetics with R
• data structures
• built-in and user-written R functions
• R utilities
• more data structures
• comparison and coercion
• lists and data frames
• resident R datasets
• R interface
• probability calculations in R
• R graphics
• R programming
• simulations
• statistical inference in R
• linear algebra
• use of R packages

within as many short chapters. The style is rather standard, that is, short paragraphs with mostly raw reproductions of line commands and their outcome. Sometimes a whole page long of code examples (if with comments). All in all I feel there are rather too few tables when compared with examples, at least for my own taste. The exercises are mostly short and, while they vary in depth, they show that the book is rather intended for students with some mathematical background (e.g., with a chapter on complex numbers and another one on linear algebra that do not seem immediately relevant for most intended readers). Or more than that, when considering one (of several) exercise (19.30) on the Black-Scholes process that mentions Brownian motion. Possibly less appealing for would-be statisticians.

I also wonder at the pedagogical choice of not including and involving more clearly graphical interfaces like R studio as students are usually not big fans of “old-style” [their wording not mine!] line command languages. For instance, the chapter on packages would have benefited from this perspective. Nothing on Rmarkdown either. Apparently nothing on handling big data, more advanced database manipulation, the related realistic dangers of memory freeze and compulsory reboot, the intricacies of managing different directories and earlier sessions, little on the urgency of avoiding loops (p.233) by vectorial programming, a paradoxically if function being introduced after ifelse, and again not that much on statistics (with density only occurring in exercises).The chapter on customising R graphics may possibly scare the intended reader when considering the all-in-one example of p.193! As we advance though the book, the more advanced examples often are fairly standard programming ones (found in other language manuals) like creating Fibonacci numbers, implementing Eratosthenes sieve, playing the Hanoi Tower game… (At least they remind me of examples read in the language manuals I read as a student.) The simulation chapter could have gone into the one (Chap. 19) on probability calculations, rather than superfluously redefining standard distributions. (Except when defining a random number as a uniformly random number (p.162).)  This chapter also spends an unusual amount of space on linear congruencial pseudo-random generators, while missing to point out the trivia that the randu dataset mentioned twice earlier is actually an outcome from the infamous RANDU Fortran generator. The following section in that chapter is written in such a way that it may give the wrong impression that one can find the analytic solution from repeated Monte Carlo experiments and hence the error. Which is rarely the case, even in finite environments with rational expectations, as one usually does not know of which unit fraction the expectation should be a multiple of. (Remember the Squid Games paradox!) And no mention is made of the prescription of always returning an error estimate along with the numerical approximation. The statistics chapter is obviously more developed, with descriptive statistics, ecdf, but no bootrstap, a t.test curiously applied to the Michelson measurements of the speed of light (how could it be zero?!), ANOVA, regression handled via lm and glm, time series analysis by ARIMA models, which I hope will not be the sole exposure of readers to these concepts.

In conclusion, there is nothing critically wrong with this manual introducing R to newcomers and I would not mind having my undergraduate students reading it (rather than our shorter and home-made handout, polished along the years) before my first mathematical statistics lab. However I do not find it massively innovative in its presentation or choice of concept, even though the most advanced examples are not necessarily standard, and may not appeal to all categories of students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## Measuring abundance [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , on January 27, 2022 by xi'an

This 2020 book, Measuring Abundance:  Methods for the Estimation of Population Size and Species Richness was written by Graham Upton, retired professor of applied statistics, for the Data in the Wild series published by Pelagic Publishing, a publishing company based in Exeter.

“Measuring the abundance of individuals and the diversity of species are core components of most ecological research projects and conservation monitoring. This book brings together in one place, for the first time, the methods used to estimate the abundance of individuals in nature.”

Its purpose is to provide a collection of statistical methods for measuring animal abundance or lack thereof. There are four parts: a primer on statistical methods, going no further than maximum likelihood estimation and bootstrap. The term Bayesian only occurs once, in connection with the (a-Bayesian) BIC. (I first spotted a second entry, until I realised this was not a typo and the example truly was about Bawean warty pigs!) The second part is about stationary (or static) individuals, such as trees, and it mostly exposes different recognised ways of sampling, with a focus on minimising the surveyor’s effort. Examples include forestry sampling (with a chainsaw method!) and underwater sampling. There is very little statistics involved in this part apart from the rare appearance of a MLE with an asymptotic confidence interval. There is also very little about misspecified models, except for the occasional warning that the estimates may prove completely wrong. The third part is about mobile individuals, with capture-recapture methods receiving the lion’s share (!). No lion was actually involved in the studies used as examples (but there were grizzly bears from Yellowstone and Banff National Parks). Given the huge variety of capture-recapture models, very little input is found within the book as the practical aspects are delegated to R software like the RMark and mra packages. Very little is written on using covariates or spatial features in such models, mostly dedicated to printed output from R packages with AIC as the sole standard for comparing models. I did not know of distance methods (Chapter 8), which are less invasive counting methods. They however seem to rely on a particular model of missing on individuals as the distance increases. The last section is about estimating the number of species. With again a model assumption that may prove wrong. With the inclusion of diversity measures,

The contents of the book are really down to earth and intended for field data gatherers. For instance, “drive slowly and steadily at 20 mph with headlights and hazard lights on ” (p.91) or “Before starting to record, allow fish time to acclimatize to the presence of divers” (p.91). It is unclear to me how useful the book would prove to be for general statisticians, apart from revealing the huge diversity of methods actually employed in the field. To either build upon these or expose students to their reassessment. More advanced books are McCrea and Morgan (2014), Buckland et al. (2016) and the most recent Seber and Schofield (2019).

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## What is luck? [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , on December 10, 2021 by xi'an

I was sent—by Columbia University Press—this book for a potential review in CHANCE: What are the chances? (Why we believe in luck?) was written by Barbara Blatchley, professor of Psychology and Neuroscience at Agnes Scott College in Decatur, Georgia. I have read rather quickly its 193 pages over the recent trips I made to Marseille and Warwick. The topic is truly about luck and the psychology of the feeling of being luck or unlucky. There is thus rather little to relate to as a statistician, as this is not a book about chance! (I always need to pay attention when using both words, since, in French chance primarily means luck, while malchance means bad luck. And the French term for chance and randomness is hasard…) The book is pleasant to read, even though the accumulation of reports about psychological studies may prove tiresome in the long run and, for a statistician, worrisome as to which percentage of such studies were properly validated by statistical arguments…

“…the famous quote by Louis Pasteur: “Dans les champs de l’observation, le hasard ne favorise que les esprits préparés”s (…) Pasteur never saw a challenge he couldn’t overcome with patience and preparation.” (p.19)

Even the part about randomness is a-statistical and mostly a-probabilist, rather focusing on our subjective and biased (un)ability to judge randomness. The author introduces us to the concepts of apophenia, which is “the unmotivated seeing of connections accompanied with a specific feeling of abnormal meaningfulness”, and of patternicity for the “tendency to find meaningful patterns in meaningless noise”. She also states that (Neyman-Pearson) Type I error is about seeing a pattern in random noise while Type II errors are for conclusion of meaningless when the data is meaningful (p.15). Which is reductive to say the least, but lead her to recall the four types of luck proposed by James Austin (which I first misread as Jane Austin).

“There is a long-standing and deeply intimate connection between luck, religion, and belief in the supernatural.” (p.28)

I enjoyed very much the sections on these connections between a belief in luck and religions, even though the anthropological references to ancient religions are not strongly connected to luck, but rather to the belief that gods and goddesses could modify one’s fate (and avoiding the most established religions). Still, I appreciate her stressing the fact that if one believes in luck (as opposed to sheer randomness), this expresses at the very least a form of irrational belief in higher powers that can bend randomness in one’s favour (or disfavour). Which is the seed for more elaborate if irrational beliefs. (For illustrations, Borgès’ stories come to mind.)

“B.F. Skinner believed that superstitious behaviour was a consequence of learning and reinforcement.” (p.85)

There are also parts where (a belief in) luck and (human) learning are connected, but, unfortunately, no mention is made of the (vaguely) Bayesian nature of the (plastic, p. 188) brain modus operandi. The large section on the brain found in the book is instead physiological, since concerned with finding regions where the belief in luck could be located. In relation with attention-deficit disorders. (Revealing the interesting existence (for me) of mirror neurons, dedicated to predicting what could happen! Described as “predictive coding”, p.153). The last chapter “How to get lucky” contains a rather lengthy account of “Clever Hans”, the 1990 German counting horse (!). Who, as well-known, reacted to subtle and possibly unconscious signals from his trainer rather than to an equine feeling for arithmetic…

One of the clearest conclusions of the book is (imho) that a belief in luck may improve the life of the believers, while a belief in being unlucky may deteriorate it. The Taoist tale finishing the book is a pure gem. But I am still in the dark as to whether or not my exceptional number of bike punctures in the past year qualifies as bad luck!

“Luck is the way you face the randomness of the world.” (p.191)

As an irrelevant aside, one anecdote at the beginning of the book brought back memories of the Wabash River flowing through Lafayette, IN, as it tells of the luck of two Purdue female rowers who attempted a transatlantic race and survived capsizing in the middle of the Atlantic. It also made me regret I had not realised at the time there was a rowing opportunity there!

## The [errors in the] error of truth [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on August 10, 2021 by xi'an

OUP sent me this book, The error of truth by Steven Osterling, for review. It is a story about the “astonishing” development of quantitative thinking in the past two centuries. Unfortunately, I found it to be one of the worst books I have read on the history of sciences…

To start with the rather obvious part, I find the scholarship behind the book quite shoddy as the author continuously brings in items of historical tidbits to support his overall narrative and sometimes fills gaps on his own. It often feels like the material comes from Wikipedia, despite expressing a critical view of the on-line encyclopedia. The [long] quote below is presumably the most shocking historical blunder, as the terror era marks the climax of the French Revolution, rather than the last fight of the French monarchy. Robespierre was the head of the Jacobins, the most radical revolutionaries at the time, and one of the Assembly members who voted for the execution of Louis XIV, which took place before the Terror. And later started to eliminate his political opponents, until he found himself on the guillotine!

“The monarchy fought back with almost unimaginable savagery. They ordered French troops to carry out a bloody campaign in which many thousands of protesters were killed. Any peasant even remotely suspected of not supporting the government was brutally killed by the soldiers; many were shot at point-blank range. The crackdown’s most intense period was a horrific ten-month Reign of Terror (“la Terreur”) during which the government guillotined untold masses (some estimates are as high as 5,000) of its own citizens as a means to control them. One of the architects of the Reign of Terror was Maximilien Robespierre, a French nobleman and lifelong politician. He explained the government’s slaughter in unbelievable terms, as “justified terror . . . [and] an emanation of virtue” (quoted in Linton 2006). Slowly, however, over the next few years, the people gained control. In the end, many nobles, including King Louis XVI and his wife Marie-Antoinette, were themselves executed by guillotining”

Obviously, this absolute misinterpretation does not matter (very) much for the (hi)story of quantification (and uncertainty assessment), but it demonstrates a lack of expertise of the author. And sap whatever trust one could have in new details he brings to light (life?). As for instance when stating

“Bayes did a lot of his developmental work while tutoring students in local pubs. He was a respected teacher. Taking advantage of his immediate resources (in his circumstance, a billiard table), he taught his theorem to many.”

which does not sound very plausible. I never heard that Bayes had students  or went to pubs or exposed his result to many before its posthumous publication… Or when Voltaire (who died in 1778) is considered as seventeenth-century precursor of the Enlightenment. Or when John Graunt, true member of the Royal Society, is given as a member of the Académie des Sciences. Or when Quetelet is presented as French and as a student of Laplace.

The maths explanations are also puzzling, from the law of large numbers illustrated by six observations, and wrongly expressed (p.54) as

$\bar{X}_n+\mu\qquad\text{when}\qquad n\longrightarrow\infty$

to  the Saint-Petersbourg paradox being seen as inverse probability, to a botched description of the central limit theorem  (p.59), including the meaningless equation (p.60)

$\gamma_n=\frac{2^{2n}}{\pi}\int_0^\pi~\cos^{2n} t\,\text dt$

to de Moivre‘s theorem being given as Taylor’s expansion

$f(z)=\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(z-a)^2$

and as his derivation of the concept of variance, to another botched depiction of the difference between Bayesian and frequentist statistics, incl. the usual horror

$P(68.5<70<71.5)=95%$

to independence being presented as a non-linear relation (p.111), to the conspicuous absence of Pythagoras in the regression chapter, to attributing to Gauss the concept of a probability density (when Simpson, Bayes, Laplace used it as well), to another highly confusing verbal explanation of densities, including a potential confusion between different representations of a distribution (Fig. 9.6) and the existence of distributions other than the Gaussian distribution, to another error in writing the Gaussian pdf (p.157),

$f(x)=\dfrac{e^{-(z-\mu)^2}\big/2\sigma^2}{\sigma\sqrt{2\pi}}$

to yet another error in the item response probability (p.301), and.. to completely missing the distinction between the map and the territory, i.e., the probabilistic model and the real world (“Truth”), which may be the most important shortcoming of the book.

The style is somewhat heavy, with many repetitions about the greatness of the characters involved in the story, and some degree of license in bringing them within the narrative of the book. The historical determinism of this narrative is indeed strong, with a tendency to link characters more than they were, and to make them greater than life. Which is a usual drawback of such books, along with the profuse apologies for presenting a few mathematical formulas!

The overall presentation further has a Victorian and conservative flavour in its adoration of great names, an almost exclusive centering on Western Europe, a patriarchal tone (“It was common for them to assist their husbands in some way or another”, p.44; Marie Curie “agreed to the marriage, believing it would help her keep her laboratory position”, p.283), a defense of the empowerment allowed by the Industrial Revolution and of the positive sides of colonialism and of the Western expansion of the USA, including the invention of Coca Cola as a landmark in the march to Progress!, to the fall of the (communist) Eastern Block being attributed to Ronald Reagan, Karol Wojtyła, and Margaret Thatcher, to the Bell Curve being written by respected professors with solid scholarship, if controversial, to missing the Ottoman Enlightenment and being particularly disparaging about the Middle East, to dismissing Galton’s eugenism as a later year misguided enthusiasm (and side-stepping the issue of Pearson’s and Fisher’s eugenic views),

Another recurrent if minor problem is the poor recording of dates and years when introducing an event or a new character. And the quotes referring to the current edition or translation instead of the original year as, e.g., Bernoulli (1954). Or even better!, Bayes and Price (1963).

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]