## What is luck? [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , on December 10, 2021 by xi'an

I was sent—by Columbia University Press—this book for a potential review in CHANCE: What are the chances? (Why we believe in luck?) was written by Barbara Blatchley, professor of Psychology and Neuroscience at Agnes Scott College in Decatur, Georgia. I have read rather quickly its 193 pages over the recent trips I made to Marseille and Warwick. The topic is truly about luck and the psychology of the feeling of being luck or unlucky. There is thus rather little to relate to as a statistician, as this is not a book about chance! (I always need to pay attention when using both words, since, in French chance primarily means luck, while malchance means bad luck. And the French term for chance and randomness is hasard…) The book is pleasant to read, even though the accumulation of reports about psychological studies may prove tiresome in the long run and, for a statistician, worrisome as to which percentage of such studies were properly validated by statistical arguments…

“…the famous quote by Louis Pasteur: “Dans les champs de l’observation, le hasard ne favorise que les esprits préparés”s (…) Pasteur never saw a challenge he couldn’t overcome with patience and preparation.” (p.19)

Even the part about randomness is a-statistical and mostly a-probabilist, rather focusing on our subjective and biased (un)ability to judge randomness. The author introduces us to the concepts of apophenia, which is “the unmotivated seeing of connections accompanied with a specific feeling of abnormal meaningfulness”, and of patternicity for the “tendency to find meaningful patterns in meaningless noise”. She also states that (Neyman-Pearson) Type I error is about seeing a pattern in random noise while Type II errors are for conclusion of meaningless when the data is meaningful (p.15). Which is reductive to say the least, but lead her to recall the four types of luck proposed by James Austin (which I first misread as Jane Austin).

“There is a long-standing and deeply intimate connection between luck, religion, and belief in the supernatural.” (p.28)

I enjoyed very much the sections on these connections between a belief in luck and religions, even though the anthropological references to ancient religions are not strongly connected to luck, but rather to the belief that gods and goddesses could modify one’s fate (and avoiding the most established religions). Still, I appreciate her stressing the fact that if one believes in luck (as opposed to sheer randomness), this expresses at the very least a form of irrational belief in higher powers that can bend randomness in one’s favour (or disfavour). Which is the seed for more elaborate if irrational beliefs. (For illustrations, Borgès’ stories come to mind.)

“B.F. Skinner believed that superstitious behaviour was a consequence of learning and reinforcement.” (p.85)

There are also parts where (a belief in) luck and (human) learning are connected, but, unfortunately, no mention is made of the (vaguely) Bayesian nature of the (plastic, p. 188) brain modus operandi. The large section on the brain found in the book is instead physiological, since concerned with finding regions where the belief in luck could be located. In relation with attention-deficit disorders. (Revealing the interesting existence (for me) of mirror neurons, dedicated to predicting what could happen! Described as “predictive coding”, p.153). The last chapter “How to get lucky” contains a rather lengthy account of “Clever Hans”, the 1990 German counting horse (!). Who, as well-known, reacted to subtle and possibly unconscious signals from his trainer rather than to an equine feeling for arithmetic…

One of the clearest conclusions of the book is (imho) that a belief in luck may improve the life of the believers, while a belief in being unlucky may deteriorate it. The Taoist tale finishing the book is a pure gem. But I am still in the dark as to whether or not my exceptional number of bike punctures in the past year qualifies as bad luck!

“Luck is the way you face the randomness of the world.” (p.191)

As an irrelevant aside, one anecdote at the beginning of the book brought back memories of the Wabash River flowing through Lafayette, IN, as it tells of the luck of two Purdue female rowers who attempted a transatlantic race and survived capsizing in the middle of the Atlantic. It also made me regret I had not realised at the time there was a rowing opportunity there!

## The [errors in the] error of truth [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on August 10, 2021 by xi'an

OUP sent me this book, The error of truth by Steven Osterling, for review. It is a story about the “astonishing” development of quantitative thinking in the past two centuries. Unfortunately, I found it to be one of the worst books I have read on the history of sciences…

To start with the rather obvious part, I find the scholarship behind the book quite shoddy as the author continuously brings in items of historical tidbits to support his overall narrative and sometimes fills gaps on his own. It often feels like the material comes from Wikipedia, despite expressing a critical view of the on-line encyclopedia. The [long] quote below is presumably the most shocking historical blunder, as the terror era marks the climax of the French Revolution, rather than the last fight of the French monarchy. Robespierre was the head of the Jacobins, the most radical revolutionaries at the time, and one of the Assembly members who voted for the execution of Louis XIV, which took place before the Terror. And later started to eliminate his political opponents, until he found himself on the guillotine!

“The monarchy fought back with almost unimaginable savagery. They ordered French troops to carry out a bloody campaign in which many thousands of protesters were killed. Any peasant even remotely suspected of not supporting the government was brutally killed by the soldiers; many were shot at point-blank range. The crackdown’s most intense period was a horrific ten-month Reign of Terror (“la Terreur”) during which the government guillotined untold masses (some estimates are as high as 5,000) of its own citizens as a means to control them. One of the architects of the Reign of Terror was Maximilien Robespierre, a French nobleman and lifelong politician. He explained the government’s slaughter in unbelievable terms, as “justified terror . . . [and] an emanation of virtue” (quoted in Linton 2006). Slowly, however, over the next few years, the people gained control. In the end, many nobles, including King Louis XVI and his wife Marie-Antoinette, were themselves executed by guillotining”

Obviously, this absolute misinterpretation does not matter (very) much for the (hi)story of quantification (and uncertainty assessment), but it demonstrates a lack of expertise of the author. And sap whatever trust one could have in new details he brings to light (life?). As for instance when stating

“Bayes did a lot of his developmental work while tutoring students in local pubs. He was a respected teacher. Taking advantage of his immediate resources (in his circumstance, a billiard table), he taught his theorem to many.”

which does not sound very plausible. I never heard that Bayes had students  or went to pubs or exposed his result to many before its posthumous publication… Or when Voltaire (who died in 1778) is considered as seventeenth-century precursor of the Enlightenment. Or when John Graunt, true member of the Royal Society, is given as a member of the Académie des Sciences. Or when Quetelet is presented as French and as a student of Laplace.

The maths explanations are also puzzling, from the law of large numbers illustrated by six observations, and wrongly expressed (p.54) as

$\bar{X}_n+\mu\qquad\text{when}\qquad n\longrightarrow\infty$

to  the Saint-Petersbourg paradox being seen as inverse probability, to a botched description of the central limit theorem  (p.59), including the meaningless equation (p.60)

$\gamma_n=\frac{2^{2n}}{\pi}\int_0^\pi~\cos^{2n} t\,\text dt$

to de Moivre‘s theorem being given as Taylor’s expansion

$f(z)=\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(z-a)^2$

and as his derivation of the concept of variance, to another botched depiction of the difference between Bayesian and frequentist statistics, incl. the usual horror

$P(68.5<70<71.5)=95%$

to independence being presented as a non-linear relation (p.111), to the conspicuous absence of Pythagoras in the regression chapter, to attributing to Gauss the concept of a probability density (when Simpson, Bayes, Laplace used it as well), to another highly confusing verbal explanation of densities, including a potential confusion between different representations of a distribution (Fig. 9.6) and the existence of distributions other than the Gaussian distribution, to another error in writing the Gaussian pdf (p.157),

$f(x)=\dfrac{e^{-(z-\mu)^2}\big/2\sigma^2}{\sigma\sqrt{2\pi}}$

to yet another error in the item response probability (p.301), and.. to completely missing the distinction between the map and the territory, i.e., the probabilistic model and the real world (“Truth”), which may be the most important shortcoming of the book.

The style is somewhat heavy, with many repetitions about the greatness of the characters involved in the story, and some degree of license in bringing them within the narrative of the book. The historical determinism of this narrative is indeed strong, with a tendency to link characters more than they were, and to make them greater than life. Which is a usual drawback of such books, along with the profuse apologies for presenting a few mathematical formulas!

The overall presentation further has a Victorian and conservative flavour in its adoration of great names, an almost exclusive centering on Western Europe, a patriarchal tone (“It was common for them to assist their husbands in some way or another”, p.44; Marie Curie “agreed to the marriage, believing it would help her keep her laboratory position”, p.283), a defense of the empowerment allowed by the Industrial Revolution and of the positive sides of colonialism and of the Western expansion of the USA, including the invention of Coca Cola as a landmark in the march to Progress!, to the fall of the (communist) Eastern Block being attributed to Ronald Reagan, Karol Wojtyła, and Margaret Thatcher, to the Bell Curve being written by respected professors with solid scholarship, if controversial, to missing the Ottoman Enlightenment and being particularly disparaging about the Middle East, to dismissing Galton’s eugenism as a later year misguided enthusiasm (and side-stepping the issue of Pearson’s and Fisher’s eugenic views),

Another recurrent if minor problem is the poor recording of dates and years when introducing an event or a new character. And the quotes referring to the current edition or translation instead of the original year as, e.g., Bernoulli (1954). Or even better!, Bayes and Price (1963).

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## quick(er) calculations [book review]

Posted in Statistics with tags , , , , , , , , , , , , on July 5, 2021 by xi'an

Upon my request, Oxford University Press sent me this book for review in CHANCE. With the extended title How to add, subtract, multiply, divide, square, and square root more swiftly. This short (173 pages) book is written by Trevor Davis Lipscombe, currently Director of the Catholic University of America Press (which are apparently not suited for his books, since his former Physics of Rugby got published by Nottingham University Press). The concept of the book is to list tricks and shortcuts to handle seemingly tough operations on a list of numbers. Illustrated by short anecdotes mostly related to religion, sports (including the Vatican cricket team!), and history, albeit not necessarily related with the computation at hand and not providing an in-depth coverage of calculation across the ages and the cultures. While the topic is rather dry, as illustrated by the section titles, e.g., “Multiply two numbers that differ by 2, 4, 6, or 20” or “Multiply or divide by 66 or 67, 666 or 667” (!), the exposition is somewhat facilitated by the (classics) culture of the author. (I have to confess I got lost by the date chapter, i.e., finding which day of the week was December 18, 1981, for instance. Especially by the concept of Doomsday which I thought was a special day of the year in the UK. Or in the USA.) Still, while recognising some simple decompositions I also used for additions and subtractions, and acknowledging the validity of the many tricks I had never though of, I wonder at the relevance of learning those dozens of approaches beyond maintaining a particular type of mental agility… Or preparing for party show-time. Especially for the operations that do not enjoy exact solutions, like dividing by √3 or multiplying by π… The book reminded me of a physics professor in Caen, Henri Eyraud, who used to approximate powers and roots faster than it took us to get a slide rule out of our bags! But Guesstimation, which I reviewed several years ago, seemed more far-reaching that Quick(er) calculations, in that I had tried to teach my kids (with limited success) how to reach the right order of magnitude of a quantity, but never insisted [beyond primary school] on quick mental calculations. (The Interlude V chapter connects with this idea.)

[Disclaimer about potential self-plagiarism: this post or an edited version should eventually appear in my Books Review section in CHANCE.]

## mathematical theory of Bayesian statistics [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on May 6, 2021 by xi'an

I came by chance (and not by CHANCE) upon this 2018 CRC Press book by Sumio Watanabe and ordered it myself to gather which material it really covered. As the back-cover blurb was not particularly clear and the title sounded quite general. After reading it, I found out that this is a mathematical treatise on some aspects of Bayesian information criteria, in particular on the Widely Applicable Information Criterion (WAIC) that was introduced by the author in 2010. The result is a rather technical and highly focussed book with little motivation or intuition surrounding the mathematical results, which may make the reading arduous for readers. Some background on mathematical statistics and Bayesian inference is clearly preferable and the book cannot be used as a textbook for most audiences, as opposed to eg An Introduction to Bayesian Analysis by J.K. Ghosh et al. or even more to Principles of Uncertainty by J. Kadane. In connection with this remark the exercises found in the book are closer to the delivery of additional material than to textbook-style exercises.

“posterior distributions are often far from any normal distribution, showing that Bayesian estimation gives the more accurate inference than other estimation methods.”

The overall setting is one where both the sampling and the prior distributions are different from respective “true” distributions. Requiring a tool to assess the discrepancy when utilising a specific pair of such distributions. Especially when the posterior distribution cannot be approximated by a Normal distribution. (Lindley’s paradox makes an interesting incognito incursion on p.238.) The WAIC is supported for the determination of the “true” model, in opposition to AIC and DIC, incl. on a mixture example that reminded me of our eight versions of DIC paper. In the “Basic Bayesian Theory” chapter (§3), the “basic theorem of Bayesian statistics” (p.85) states that the various losses related with WAIC can be expressed as second-order Taylor expansions of some cumulant generating functions, with order o(n⁻¹), “even if the posterior distribution cannot be approximated by any normal distribution” (p.87). With the intuition that

“if a log density ratio function has a relatively finite variance then the generalization loss, the cross validation loss, the training loss and WAIC have the same asymptotic behaviors.”

Obviously, these “basic” aspects should come as a surprise to a fair percentage of Bayesians (in the sense of not being particularly basic). Myself included. Chapter 4 exposes why, for regular models, the posterior distribution accumulates in an ε neighbourhood of the optimal parameter at a speed O(n2/5). With the normalised partition function being of order n-d/2 in the neighbourhood and exponentially negligible outside. A consequence of this regular asymptotic theory is that all above losses are asymptotically equivalent to the negative log likelihood plus similar order n⁻¹ terms that can be ordered. Chapters 5 and 6 deal with “standard” [the likelihood ratio is a multi-index power of the parameter ω] and general posterior distributions that can be written as mixtures of standard distributions,  with expressions of the above losses in terms of new universal constants. Again, a rather remote concern of mine. The book also includes a chapter (§7) on MCMC, with a rather involved proof that a Metropolis algorithm satisfies detailed balance (p.210). The Gibbs sampling section contains an extensive example on a two-dimensional two-component unit-variance Normal mixture, with an unusual perspective on the posterior, which is considered as “singular” when the true means are close. (Label switching or the absence thereof is not mentioned.) In terms of approximating the normalising constant (or free energy), the only method discussed there is path sampling, with a cryptic remark about harmonic mean estimators (not identified as such). In a final knapsack chapter (§9),  Bayes factors (confusedly denoted as L(x)) are shown to be most powerful tests in a Bayesian sense when comparing hypotheses without prior weights on said hypotheses, while posterior probability ratios are the natural statistics for comparing models with prior weights on said models. (With Lindley’s paradox making another appearance, still incognito!) And a  notion of phase transition for hyperparameters is introduced, with the meaning of a radical change of behaviour at a critical value of said hyperparameter. For instance, for a simple normal- mixture outlier model, the critical value of the Beta hyperparameter is α=2. Which is a wee bit of a surprise when considering Rousseau and Mengersen (2011) since their bound for consistency was α=d/2.

In conclusion, this is quite an original perspective on Bayesian models, covering the somewhat unusual (and potentially controversial) issue of misspecified priors and centered on the use of information criteria. I find the book could have benefited from further editing as I noticed many typos and somewhat unusual sentences (at least unusual to me).

[Disclaimer about potential self-plagiarism: this post or an edited version should eventually appear in my Books Review section in CHANCE.]

## poems that solve puzzles [book review]

Posted in Books, Kids, University life with tags , , , , , , , , , , , , , , , , , , on January 7, 2021 by xi'an

Upon request, I received this book from Oxford University Press for review. Poems that Solve Puzzles is a nice title and its cover is quite to my linking (for once!). The author is Chris Bleakley, Head of the School of Computer Science at UCD.

“This book is for people that know algorithms are important, but have no idea what they are.”

These is the first sentence of the book and hence I am clearly falling outside the intended audience. When I asked OUP for a review copy, I was more thinking in terms of Robert Sedgewick’s Algorithms, whose first edition still sits on my shelves and which I read from first to last page when it appeared [and was part of my wife’s booklist]. This was (and is) indeed a fantastic book to learn how to build and optimise algorithms and I gain a lot from it (despite remaining a poor programmer!).

Back to poems, this one reads much more like an history of computer science for newbies than a deep entry into the “science of algorithms”, with imho too little on the algorithms themselves and their connections with computer languages and too much emphasis on the pomp and circumstances of computer science (like so-and-so got the ACM A.M. Turing Award in 19… and  retired in 19…). Beside the antique algorithms for finding primes, approximating π, and computing the (fast) Fourier transform (incl. John Tukey), the story moves quickly to the difference engine of Charles Babbage and Ada Lovelace, then to Turing’s machine, and artificial intelligence with the first checkers codes, which already included some learning aspects. Some sections on the ENIAC, John von Neumann and Stan Ulam, with the invention of Monte Carlo methods (but no word on MCMC). A bit of complexity theory (P versus NP) and then Internet, Amazon, Google, Facebook, Netflix… Finishing with neural networks (then and now), the unavoidable AlphaGo, and the incoming cryptocurrencies and quantum computers. All this makes for pleasant (if unsurprising) reading and could possibly captivate a young reader for whom computers are more than a gaming console or a more senior reader who so far stayed wary and away of computers. But I would have enjoyed much more a low-tech discussion on the construction, validation and optimisation of algorithms, namely a much soft(ware) version, as it would have made it much more distinct from the existing offer on the history of computer science.

[Disclaimer about potential self-plagiarism: this post or an edited version of it will eventually appear in my Books Review section in CHANCE.]