## Bayes Rules! [book review]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 5, 2022 by xi'an

Bayes Rules! is a new introductory textbook on Applied Bayesian Model(l)ing, written by Alicia Johnson (Macalester College), Miles Ott (Johnson & Johnson), and Mine Dogucu (University of California Irvine). Textbook sent to me by CRC Press for review. It is available (free) online as a website and has a github site, as well as a bayesrule R package. (Which reminds me that both our own book R packages, bayess and mcsm, have gone obsolete on CRAN! And that I should find time to figure out the issue for an upgrading…)

As far as I can tell [from abroad and from only teaching students with a math background], Bayes Rules! seems to be catering to early (US) undergraduate students with very little exposure to mathematical statistics or probability, as it introduces basic probability notions like pmf, joint distribution, and Bayes’ theorem (as well as Greek letters!) and shies away from integration or algebra (a covariance matrix occurs on page 437 with a lot . For instance, the Normal-Normal conjugacy derivation is considered a “mouthful” (page 113). The exposition is somewhat stretched along the 500⁺ pages as a result, imho, which is presumably a feature shared with most textbooks at this level, and, accordingly, the exercises and quizzes are more about intuition and reproducing the contents of the chapter than technical. In fact, I did not spot there a mention of sufficiency, consistency, posterior concentration (almost made on page 113), improper priors, ergodicity, irreducibility, &tc., while other notions are not precisely defined, like ESS, weakly informative (page 234) or vague priors (page 77), prior information—which makes the negative answer to the quiz “All priors are informative”  (page 90) rather confusing—, R-hat, density plot, scaled likelihood, and more.

As an alternative to “technical derivations” Bayes Rules! centres on intuition and simulation (yay!) via its bayesrule R package. Itself relying on rstan. Learning from example (as R code is always provided), the book proceeds through conjugate priors, MCMC (Metropolis-Hasting) methods, regression models, and hierarchical regression models. Quite impressive given the limited prerequisites set by the authors. (I appreciated the representations of the prior-likelihood-posterior, especially in the sequential case.)

Regarding the “hot tip” (page 108) that the posterior mean always stands between the prior mean and the data mean, this should be made conditional on a conjugate setting and a mean parameterisation. Defining MCMC as a method that produces a sequence of realisations that are not from the target makes a point, except of course that there are settings where the realisations are from the target, for instance after a renewal event. Tuning MCMC should remain a partial mystery to readers after reading Chapter 7 as the Goldilocks principle is quite vague. Similarly, the derivation of the hyperparameters in a novel setting (not covered by the book) should prove a challenge, even though the readers are encouraged to “go forth and do some Bayes things” (page 509).

While Bayes factors are supported for some hypothesis testing (with no point null), model comparison follows more exploratory methods like X validation and expected log-predictive comparison.

The examples and exercises are diverse (if mostly US centric), modern (including cultural references that completely escape me), and often reflect on the authors’ societal concerns. In particular, their concern about a fair use of the inferred models is preminent, even though a quantitative assessment of the degree of fairness would require a much more advanced perspective than the book allows… (In that respect, Exercise 18.2 and the following ones are about book banning (in the US). Given the progressive tone of the book, and the recent ban of math textbooks in the US, I wonder if some conservative boards would consider banning it!) Concerning the Himalaya submitting running example (Chapters 18 & 19), where the probability to summit is conditional on the age of the climber and the use of additional oxygen, I am somewhat surprised that the altitude of the targeted peak is not included as a covariate. For instance, Ama Dablam (6848 m) is compared with Annapurna I (8091 m), which has the highest fatality-to-summit ratio (38%) of all. This should matter more than age: the Aosta guide Abele Blanc climbed Annapurna without oxygen at age 57! More to the point, the (practical) detailed examples do not bring unexpected conclusions, as for instance the fact that runners [thrice alas!] tend to slow down with age.

A geographical comment: Uluru (page 267) is not a city!, but an impressive sandstone monolith in the heart of Australia, a 5 hours drive away from Alice Springs. And historical mentions: Alan Turing (page 10) and the team at Bletchley Park indeed used Bayes factors (and sequential analysis) in cracking the Enigma, but this remained classified information for quite a while. Arianna Rosenbluth (page 10, but missing on page 165) was indeed a major contributor to Metropolis et al.  (1953, not cited), but would not qualify as a Bayesian statistician as the goal of their algorithm was a characterisation of the Boltzman (or Gibbs) distribution, not statistical inference. And David Blackwell’s (page 10) Basic Statistics is possibly the earliest instance of an introductory Bayesian and decision-theory textbook, but it never mentions Bayes or Bayesianism.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## learning base R [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , on February 26, 2022 by xi'an

This second edition of an introductory R book was sent to me by the author for a potential CHANCE book review.  As there are many (many) books in the same spirit, the main question behind my reading it (in one go) was on the novelty it brings. The topics Learning Base R covers are

• arithmetics with R
• data structures
• built-in and user-written R functions
• R utilities
• more data structures
• comparison and coercion
• lists and data frames
• resident R datasets
• R interface
• probability calculations in R
• R graphics
• R programming
• simulations
• statistical inference in R
• linear algebra
• use of R packages

within as many short chapters. The style is rather standard, that is, short paragraphs with mostly raw reproductions of line commands and their outcome. Sometimes a whole page long of code examples (if with comments). All in all I feel there are rather too few tables when compared with examples, at least for my own taste. The exercises are mostly short and, while they vary in depth, they show that the book is rather intended for students with some mathematical background (e.g., with a chapter on complex numbers and another one on linear algebra that do not seem immediately relevant for most intended readers). Or more than that, when considering one (of several) exercise (19.30) on the Black-Scholes process that mentions Brownian motion. Possibly less appealing for would-be statisticians.

I also wonder at the pedagogical choice of not including and involving more clearly graphical interfaces like R studio as students are usually not big fans of “old-style” [their wording not mine!] line command languages. For instance, the chapter on packages would have benefited from this perspective. Nothing on Rmarkdown either. Apparently nothing on handling big data, more advanced database manipulation, the related realistic dangers of memory freeze and compulsory reboot, the intricacies of managing different directories and earlier sessions, little on the urgency of avoiding loops (p.233) by vectorial programming, a paradoxically if function being introduced after ifelse, and again not that much on statistics (with density only occurring in exercises).The chapter on customising R graphics may possibly scare the intended reader when considering the all-in-one example of p.193! As we advance though the book, the more advanced examples often are fairly standard programming ones (found in other language manuals) like creating Fibonacci numbers, implementing Eratosthenes sieve, playing the Hanoi Tower game… (At least they remind me of examples read in the language manuals I read as a student.) The simulation chapter could have gone into the one (Chap. 19) on probability calculations, rather than superfluously redefining standard distributions. (Except when defining a random number as a uniformly random number (p.162).)  This chapter also spends an unusual amount of space on linear congruencial pseudo-random generators, while missing to point out the trivia that the randu dataset mentioned twice earlier is actually an outcome from the infamous RANDU Fortran generator. The following section in that chapter is written in such a way that it may give the wrong impression that one can find the analytic solution from repeated Monte Carlo experiments and hence the error. Which is rarely the case, even in finite environments with rational expectations, as one usually does not know of which unit fraction the expectation should be a multiple of. (Remember the Squid Games paradox!) And no mention is made of the prescription of always returning an error estimate along with the numerical approximation. The statistics chapter is obviously more developed, with descriptive statistics, ecdf, but no bootrstap, a t.test curiously applied to the Michelson measurements of the speed of light (how could it be zero?!), ANOVA, regression handled via lm and glm, time series analysis by ARIMA models, which I hope will not be the sole exposure of readers to these concepts.

In conclusion, there is nothing critically wrong with this manual introducing R to newcomers and I would not mind having my undergraduate students reading it (rather than our shorter and home-made handout, polished along the years) before my first mathematical statistics lab. However I do not find it massively innovative in its presentation or choice of concept, even though the most advanced examples are not necessarily standard, and may not appeal to all categories of students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## understanding elections through statistics [book review]

Posted in Books, Kids, R, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , , , , , on October 12, 2020 by xi'an

A book to read most urgently if hoping to take an informed decision by 03 November! Written by a political scientist cum statistician, Ole Forsberg. (If you were thinking of another political scientist cum statistician, he wrote red state blue state a while ago! And is currently forecasting the outcome of the November election for The Economist.)

“I believe [omitting educational level] was the main reason the [Brexit] polls were wrong.”

The first part of the book is about the statistical analysis of opinion polls (assuming their outcome is given, rather than designing them in the first place). And starting with the Scottish independence referendum of 2014. The first chapter covering the cartoon case of simple sampling from a population, with or without replacement, Bayes and non-Bayes. In somewhat too much detail imho given that this is an unrealistic description of poll outcomes. The second chapter expands to stratified sampling (with confusing title [Polling 399] and entry, since it discusses repeated polls that are not processed in said chapter). Mentioning the famous New York Times experiment where five groups of pollsters analysed the same data, making different decisions in adjusting the sample and identifying likely voters, and coming out with a range of five points in the percentage. Starting to get a wee bit more advanced when designing priors for the population proportions. But still studying a weighted average of the voting intentions for each category. Chapter three reaches the challenging task of combining polls, with a 2017 (South) Korea presidential election as an illustration, involving five polls. It includes a solution to handling older polls by proposing a simple linear regression against time. Chapter 4 sums up the challenges of real-life polling by examining the disastrous 2016 Brexit referendum in the UK. Exposing for instance the complicated biases resulting from polling by phone or on-line. The part that weights polling institutes according to quality does not provide any quantitative detail. (And also a weird averaging between the levels of “support for Brexit” and “maybe-support for Brexit”, see Fig. 4.5!) Concluding as quoted above that missing the educational stratification was the cause for missing the shock wave of referendum day is a possible explanation, but the massive difference in turnover between the age groups, itself possibly induced by the reassuring figures of the published polls and predictions, certainly played a role in missing the (terrible) outcome.

“The fabricated results conformed to Benford’s law on first digits, but failed to obey Benford’s law on second digits.” Wikipedia

The second part of this 200 page book is about election analysis, towards testing for fraud. Hence involving the ubiquitous Benford law. Although applied to the leading digit which I do not think should necessarily follow Benford law due to both the varying sizes and the non-uniform political inclinations of the voting districts (of which there are 39 for the 2009 presidential Afghan election illustration, although the book sticks at 34 (p.106)). My impression was that instead lesser digits should be tested. Chapter 4 actually supports the use of the generalised Benford distribution that accounts for differences in turnouts between the electoral districts. But it cannot come up with a real-life election where the B test points out a discrepancy (and hence a potential fraud). Concluding with the author’s doubt [repeated from his PhD thesis] that these Benford tests “are specious at best”, which makes me wonder why spending 20 pages on the topic. The following chapter thus considers other methods, checking for differential [i.e., not-at-random] invalidation by linear and generalised linear regression on the supporting rate in the district. Once again concluding at no evidence of such fraud when analysing the 2010 Côte d’Ivoire elections (that led to civil war). With an extension in Chapter 7 to an account for spatial correlation. The book concludes with an analysis of the Sri Lankan presidential elections between 1994 and 2019, with conclusions of significant differential invalidation in almost every election (even those not including Tamil provinces from the North).

R code is provided and discussed within the text. Some simple mathematical derivations are found, albeit with a huge dose of warnings (“math-heavy”, “harsh beauty”) and excuses (“feel free to skim”, “the math is entirely optional”). Often, one wonders at the relevance of said derivations for the intended audience and the overall purpose of the book. Nonetheless, it provides an interesting entry on (relatively simple) models applied to election data and could certainly be used as an original textbook on modelling aggregated count data, in particular as it should spark the interest of (some) students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

## golden Bayesian!

Posted in Statistics with tags , , , , , , , , , on November 11, 2017 by xi'an

## computational methods for numerical analysis with R [book review]

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , on October 31, 2017 by xi'an

This is a book by James P. Howard, II, I received from CRC Press for review in CHANCE. (As usual, the customary warning applies: most of this blog post will appear later in my book review column in CHANCE.) It consists in a traditional introduction to numerical analysis with backup from R codes and packages. The early chapters are setting the scenery, from basics on R to notions of numerical errors, before moving to linear algebra, interpolation, optimisation, integration, differentiation, and ODEs. The book comes with a package cmna that reproduces algorithms and testing. While I do not find much originality in the book, given its adherence to simple resolutions of the above topics, I could nonetheless use it for an elementary course in our first year classes. With maybe the exception of the linear algebra chapter that I did not find very helpful.

“…you can have a solution fast, cheap, or correct, provided you only pick two.” (p.27)

The (minor) issue I have with the book and that a potential mathematically keen student could face as well is that there is little in the way of justifying a particular approach to a given numerical problem (as opposed to others) and in characterising the limitations and failures of the presented methods (although this happens from time to time as e.g. for gradient descent, p.191). [Seeping in my Gallic “mal-être”, I am prone to over-criticise methods during classing, to the (increased) despair of my students!, but I also feel that avoiding over-rosy presentations is a good way to avoid later disappointments or even disasters.] In the case of this book, finding [more] ways of detecting would-be disasters would have been nice.

An uninteresting and highly idiosyncratic side comment is that the author preferred the French style for long division to the American one, reminding me of my first exposure to the latter, a few months ago! Another comment from a statistician is that mentioning time series inter- or extra-polation without a statistical model sounds close to anathema! And makes extrapolation a weapon without a cause.

“…we know, a priori, exactly how long the [simulated annealing] process will take since it is a function of the temperature and the cooling rate.” (p.199)

Unsurprisingly, the section on Monte Carlo integration is disappointing for a statistician/probabilistic numericist like me,  as it fails to give a complete enough picture of the methodology. All simulations seem to proceed there from a large enough hypercube. And recommending the “fantastic” (p.171) R function integrate as a default is scary, given the ability of the selected integration bounds to misled its users. Similarly, I feel that the simulated annealing section is not providing enough of a cautionary tale about the highly sensitive impact of cooling rates and absolute temperatures. It is only through the raw output of the algorithm applied to the travelling salesman problem that the novice reader can perceive the impact of some of these factors. (The acceptance bound on the jump (6.9) is incidentally wrongly called a probability on p.199, since it can take values larger than one.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]