**I**t came as a surprise to me that the book reviewed in the book review section of Nature of 25 June was a personal account of a professional poker player, The Biggest Bluff by Maria Konnikova. (Surprise enough to write a blog entry!) As I see very little scientific impetus in studying the psychology of poker players and the associated decision making. Obviously, *this is not a book review*, but a review of the book review. (Although the NYT published a rather extensive extract of the book, from which I cannot detect anything deep from a game-theory viewpoint. Apart from the maybe-not-so-deep message that psychology matters a lot in poker…) Which does not bring much incentive for those uninterested (or worse) in money games like poker. Even when “a heap of Bayesian model-building [is] thrown in”, as the review mixes randomness and luck, while seeing the book as teaching the reader “how to play the game of life”, a type of self-improvement vending line one hardly expects to read in a scientific journal. (But again I have never understood the point in playing poker…)

## Archive for CHANCE

## the biggest bluff [not a book review]

Posted in Books with tags book review, decision theory, Nature, CHANCE, poker, NYT, psychology, data, betting, game theory, game, not a book review on August 14, 2020 by xi'an## Monte Carlo Markov chains

Posted in Books, Statistics, University life with tags Andrei Kolmogorov, Bayesian Analysis, Bayesian model comparison, book review, CHANCE, Gregor Mendel, iris data, irreducibility, JASA, Jeffreys priors, Kolmogorov axioms, Kolmogorov-Smirnov distance, MCMC, physics, population genetics, pot-pourri, recurrence, Richard Price, Ronald Fisher, Springer-Verlag, textbook, Thomas Bayes, W. Gosset on May 12, 2020 by xi'an**D**arren Wraith pointed out this (currently free access) Springer book by Massimiliano Bonamente [whose family name means *good spirit* in Italian] to me for its use of the unusual *Monte Carlo Markov chain* rendering of MCMC. (Google Trend seems to restrict its use to California!) This is a graduate text for physicists, but one could nonetheless expect more rigour in the processing of the topics. Particularly of the Bayesian topics. Here is a pot-pourri of memorable quotes:

*“Two major avenues are available for the assignment of probabilities. One is based on the repetition of the experiments a large number of times under the same conditions, and goes under the name of the frequentist or classical method. The other is based on a more theoretical knowledge of the experiment, but without the experimental requirement, and is referred to as the Bayesian approach.”*

*“The Bayesian probability is assigned based on a quantitative understanding of the nature of the experiment, and in accord with the Kolmogorov axioms. It is sometimes referred to as *empirical probability*, in recognition of the fact that sometimes the probability of an event is assigned based upon a practical knowledge of the experiment, although without the classical requirement of repeating the experiment for a large number of times. This method is named after the Rev. Thomas Bayes, who pioneered the development of the theory of probability.”*

*“The likelihood P(B/A) represents the probability of making the measurement B given that the model A is a correct description of the experiment.”*

*“…a uniform distribution is normally the logical assumption in the absence of other information.”*

*“The Gaussian distribution can be considered as a special case of the binomial, when the number of tries is sufficiently large.”*

*“This clearly does not mean that the Poisson distribution has no variance—in that case, it would not be a random variable!”*

*“The method of moments therefore returns unbiased estimates for the mean and variance of every distribution in the case of a large number of measurements.”*

*“The great advantage of the Gibbs sampler is the fact that the acceptance is 100 %, since there is no rejection of candidates for the Markov chain, unlike the case of the Metropolis–Hastings algorithm.”*

Let me then point out (or just whine about!) the book using “statistical independence” for plain independence, the use of / rather than Jeffreys’ | for conditioning (and sometimes forgetting \ in some LaTeX formulas), the confusion between events and random variables, esp. when computing the posterior distribution, between models and parameter values, the reliance on discrete probability for continuous settings, as in the Markov chain chapter, confusing density and probability, using Mendel’s pea data without mentioning the unlikely fit to the expected values (or, as put more subtly by Fisher (1936), “the data of most, if not all, of the experiments have been falsified so as to agree closely with Mendel’s expectations”), presenting Fisher’s and Anderson’s *Iris data* [a motive for rejection when George was JASA editor!] as a “a new classic experiment”, mentioning Pearson but not Lee for the data in the 1903 Biometrika paper “On the laws of inheritance in man” (and woman!), and not accounting for the discrete nature of this data in the linear regression chapter, the three page derivation of the Gaussian distribution from a Taylor expansion of the Binomial pmf obtained by differentiating in the integer argument, spending endless pages on deriving standard properties of classical distributions, this appalling mess of adding over the conditioning atoms with no normalisation in a Poisson experiment

,

botching the proof of the CLT, which is treated *before* the Law of Large Numbers, restricting maximum likelihood estimation to the Gaussian and Poisson cases and muddling its meaning by discussing unbiasedness, confusing a drifted Poisson random variable with a drift on its parameter, as well as using the pmf of the Poisson to define an area under the curve (Fig. 5.2), sweeping the improperty of a constant prior under the carpet, defining a null hypothesis as a range of values for a summary statistic, no mention of Bayesian perspectives in the hypothesis testing, model comparison, and regression chapters, having one-dimensional case chapters followed by two-dimensional case chapters, reducing model comparison to the use of the Kolmogorov-Smirnov test, processing bootstrap and jackknife in the Monte Carlo chapter without a mention of importance sampling, stating recurrence results without assuming irreducibility, motivating MCMC by the intractability of the evidence, resorting to the term *link* to designate the current value of a Markov *chain*, incorporating the need for a prior distribution in a terrible description of the Metropolis-Hastings algorithm, including a discrete proof for its stationarity, spending many pages on early 1990’s MCMC convergence tests rather than discussing the adaptive scaling of proposal distributions, the inclusion of numerical tables [in a 2017 book] and turning Bayes (1763) into Bayes and Price (1763), or Student (1908) into Gosset (1908).

*[Usual disclaimer about potential self-plagiarism: this post or an edited version of it could possibly appear later in my Books Review section in CHANCE. Unlikely, though!]*

## essentials of probability theory for statisticians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags asymptotics, book review, central limit theorem, CHANCE, Cydonia, face on Mars, Glivenko-Cantelli Theorem, Henri Lebesgue, Lebesque integration, measure theory, pareidolia, probability theory, quincunx on April 25, 2020 by xi'an**O**n yet another confined sunny lazy Sunday morning, I read through Proschan and Shaw’s Essentials of Probability Theory for Statisticians, a CRC Press book that was sent to me quite a while ago for review. The book was indeed published in 2016. Before moving to serious things, let me evacuate the customary issue with the cover. I have trouble getting the point of the “face on Mars” being adopted as the cover of a book on probability theory (rather than a book on, say, pareidolia). There is a brief paragraph on post-facto probability calculations, stating how meaningless the question of the probability of this shade appearing on a Viking Orbiter picture by “chance”, but this is so marginal I would have preferred any other figure from the book!

The book plans to cover the probability essentials for dealing with graduate level statistics and in particular convergence, conditioning, and paradoxes following from using non-rigorous approaches to probability. A range that completely fits my own prerequisite for statistics students in my classes and that of course involves the recourse to (Lebesgue) measure theory. And a goal that I find both commendable and comforting as my past experience with exchange students led me to the feeling that rigorous probability theory was mostly scrapped from graduate programs. While the book is not extremely formal, it provides a proper motivation for the essential need of measure theory to handle the complexities of statistical analysis and in particular of asymptotics. It thus relies as much as possible on examples that stem from or relate to statistics, even though most examples may appear as standard to senior readers. For instance the consistency of the sample median or a weak version of the Glivenko-Cantelli theorem. The final chapter is dedicated to applications (in the probabilist’ sense!) that emerged from statistical problems. I felt these final chapters were somewhat stretched compared with what they could have been, as for instance with the multiple motivations of the conditional expectation, but this simply makes for more material. If I had to teach this material to students, I would certainly rely on the book! in particular because of the repeated appearances of the quincunx for motivating non-Normal limites. (A typo near Fatou’s lemma missed the dominating measure. And I did not notice the Riemann notation *dx* being extended to the measure in a formal manner.)

*[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]*

## a computational approach to statistical learning [book review]

Posted in Books, R, Statistics, University life with tags book review, CHANCE, computational statistics, Conan Doyle, COVID-19, CRC Press, Eiffel Peak, Elements of Statistical Learning, linear model, misspecified model, PCA, prediction, Sherlock Holmes, significance stars, Statistical learning, the the, typos on April 15, 2020 by xi'an**T**his book was sent to me by CRC Press for review for CHANCE. I read it over a few mornings while [confined] at home and found it much more *computational* than *statistical*. In the sense that the authors go quite thoroughly into the construction of standard learning procedures, including home-made R codes that obviously help in understanding the nitty-gritty of these procedures, what they call *try and tell*, but that the statistical meaning and uncertainty of these procedures remain barely touched by the book. This is not uncommon to the machine-learning literature where prediction error on the testing data often appears to be the final goal but this is not so traditionally statistical. The authors introduce their work as (a computational?) supplementary to Elements of Statistical Learning, although I would find it hard to either squeeze both books into one semester or dedicate two semesters on the topic, especially at the undergraduate level.

Each chapter includes an extended analysis of a specific dataset and this is an asset of the book. If sometimes over-reaching in selling the predictive power of the procedures. Printed extensive R scripts may prove tiresome in the long run, at least to me, but this may simply be a generational gap! And the learning models are mostly unidimensional, see eg the chapter on linear smoothers with imho a profusion of methods. (Could someone please explain the point of Figure 4.9 to me?) The chapter on neural networks has a fairly intuitive introduction that should reach fresh readers. Although meeting the handwritten digit data made me shift back to the late 1980’s, when my wife was working on automatic character recognition. But I found the visualisation of the learning weights for character classification hinting at their shape (p.254) most alluring!

Among the things I am missing when reading through this book, a life-line on the meaning of a statistical model beyond prediction, attention to misspecification, uncertainty and variability, especially when reaching outside the range of the learning data, and further especially when returning regression outputs with significance stars, discussions on the assessment tools like the distance used in the objective function (for instance lacking in scale invariance when adding errors on the regression coefficients) or the unprincipled multiplication of calibration parameters, some asymptotics, at least one remark on the information loss due to splitting the data into chunks, giving some (asymptotic) substance when using “consistent”, waiting for a single page 319 to see the “data quality issues” being mentioned. While the methodology is defended by algebraic and calculus arguments, there is very little on the probability side, which explains why the authors consider that the students need “be familiar with the concepts of expectation, bias and variance”. And only that. A few paragraphs on the Bayesian approach are doing more harm than well, especially with so little background in probability and statistics.

The book possibly contains the most unusual introduction to the linear model I can remember reading: Coefficients as derivatives… Followed by a very detailed coverage of matrix inversion and singular value decomposition. (Would not sound like the #1 priority were I to give such a course.)

The inevitable typo “the the” was found on page 37! A less common typo was Jensen’s inequality spelled as “Jenson’s inequality”. Both in the text (p.157) and in the index, followed by a repetition of the same formula in (6.8) and (6.9). A “stwart” (p.179) that made me search a while for this unknown verb. Another typo in the Nadaraya-Watson kernel regression, when the bandwidth *h* suddenly turns into *n* (and I had to check twice because of my poor eyesight!). An unusual use of partition where the sets in the partition are called partitions themselves. Similarly, fluctuating use of dots for products in dimension one, including a form of ⊗ for matricial product (in equation (8.25)) followed next page by the notation for the Hadamard product. I also suspect the matrix K in (8.68) is missing 1’s or am missing the point, since K is the number of kernels on the next page, just after a picture of the Eiffel Tower…) A surprising number of references for an undergraduate textbook, with authors sometimes cited with full name and sometimes cited with last name. And technical reports that do not belong to this level of books. Let me add the pedant remark that Conan Doyle wrote more novels “that do not include his character Sherlock Holmes” than novels which do include Sherlock.

*[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]*

## 9 pitfalls of data science [book review]

Posted in Books, Kids, Statistics, Travel, University life with tags Austria, book review, CHANCE, Germany, jiu-jitsu, lotus, OUP, Oxford University Press, poker, Salzburg, The Book of Why, Theranos, train travel, USA on September 11, 2019 by xi'an**I** received The 9 pitfalls of data science by Gary Smith [who has written a significant number of general public books on personal investment, statistics and AIs] and Jay Cordes from OUP for review a few weeks ago and read it on my trip to Salzburg. This short book contains a lot of anecdotes and what I would qualify of small talk on job experiences and colleagues’ idiosyncrasies…. More fundamentally, it reads as a sequence of examples of bad or misused statistics, as many general public books on statistics do, but with little to say on how to spot such misuses of statistics. Its title (It seems like *the 9 pitfalls of…* is a rather common début for a book title!) however started a (short) conversation with my neighbour on the train to Salzburg as she wanted to know if the job opportunities in data sciences were better in Germany than in Austria. A practically important question for which I had no clue. And I do not think the book would have helped either! (My neighbour in the earlier plane to München had a book on growing lotus, which was not particularly enticing for launching a conversation either.)

Chapter I “*Using bad data*” is made of examples of truncated or cherry picked data often associated with poor graphics. Only one dimensional outcome and also very US centric. Chapter II “*Data before theory*” highlights spurious correlations and post hoc predictions, criticism of data mining, some examples being quite standard. Chapter III “*Worshiping maths*” sounds like the perfect opposite of the previous cahpter: it discusses the fact that all models are wrong but some may be more wrong than others. And gives examples of over fitting, p-value hacking, regression applied to longitudinal data. With the message that (maths) assumptions are handy and helpful but not always realistic. Chapter IV “*Worshiping computers*” is about the new golden calf and contains rather standard stuff on trusting the computer output because it is a machine. However, the book is somewhat falling foul of the same mistake by trusting a Monte Carlo simulation of a shortfall probability for retirees since Monte Carlo also depends on a model! Computer simulations may be fine for Bingo night or poker tournaments but much more uncertain for complex decisions like retirement investments. It is also missing the biasing aspects in constructing recidivism prediction models pointed out in Weapons of math destruction. Until Chapter 9 at least. The chapter is also mentioning adversarial attacks if not GANs (!). Chapter V “*Torturing data*” mentions famous cheaters like Wansink of the bottomless bowl and pizza papers and contains more about p-hacking and reproducibility. Chapter VI “*Fooling yourself*” is a rather weak chapter in my opinion. Apart from Ioannidis take on Theranos’ lack of scientific backing, it spends quite a lot of space on stories about poker gains in the unregulated era of online poker, with boasts of significant gains that are possibly earned from compulsive gamblers playing their family savings, which is not particularly praiseworthy. And about Brazilian jiu-jitsu. Chapter VII “*Correlation vs causation*” predictably mentions Judea Pearl (whose book of why I just could not finish after reading one rant too many about statisticians being unable to get causality right! Especially after discussing the book with Andrew.). But not so much to gather from the chapter, which could have instead delved into deep learning and its ways to avoid overfitting. The first example of this chapter is more about confusing conditionals (what is conditional on what?) than turning causation around. Chapter VII “*Regression to the mean*” sees Galton’s quincunx reappearing here after Pearl’s book where I learned (and checked with Steve Stiegler) that the device was indeed intended for that purpose of illustrating regression to the mean. While the attractive fallacy is worth pointing out there are much worse abuses of regression that could be presented. CHANCE’s Howard Wainer also makes an appearance along SAT scores. Chapter IX “*Doing harm*” does engage into the issue that predicting social features like recidivism by a (black box) software is highly worrying (and just plain wrong) if only because of this black box nature. Moving predictably to chess and go with the right comment that this does not say much about real data problems. A word of warning about DNA testing containing very little about ancestry, if only because of the company limited and biased database. With further calls for data privacy and a rather useless entry on North Korea. Chapter X “*The Great Recession*“, which discusses the subprime scandal (as in Stewart’s book), contains a set of (mostly superfluous) equations from Samuelson’s paper (supposed to scare or impress the reader?!) leading to the rather obvious result that the expected concave utility of a weighted average of iid positive rvs is maximal when all the weights are equal, result that is criticised by laughing at the assumption of iid-ness in the case of mortgages. Along with those who bought exotic derivatives whose construction they could not understand. The (short) chapter keeps going through all the (a posteriori) obvious ingredients for a financial disaster to link them to most of the nine pitfalls. Except the second about data before theory, because there was no data, only theory with no connection with reality. This final chapter is rather enjoyable, if coming after the facts. And containing this altogether unnecessary mathematical entry. *[Usual warning: this review or a revised version of it is likely to appear in CHANCE, in my book reviews column.]*