Statistical rethinking [book review]
Statistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!
“Make no mistake: you will wreck Prague eventually.” (p.10)
While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book.
“We don’t use the command line because we are hardcore or elitist (although we might be). We use the command line because it is better. It is harder at first (…) the ethical and cost saving advantages are worth the inconvenience.” (p.xv)
While trying not to shoot myself in the foot (!), I must acknowledge that the book also shares some common goal and coverage with our own Bayesian Essentials with R (and earlier Bayesian Core) in that it introduces Bayesian thinking and critical modelling through specific problems and spelled out R codes, if not dedicated datasets. Statistical Rethinking manages this all-inclusive most nicely and I would say somehow more smoothly than in Bayesian Essentials, also reaching further in terms of modelling (thanks to its 450 more pages). Not unlike Bayesian Core, McElreath’s style also incorporates vignettes for more advanced issues, called Rethinking, and R tricks and examples, called Overthinking.
“A common notion about Bayesian data analysis (…) is that it is distinguished by the use of Bayes’ theorem. This is a mistake.” (p.37)
Chapter 2 mentions Borges’ Garden of Forking Paths in a typical Gelmanesque tradition (Borges who also wrote a poem on the golem). It is however illustrated by a ball-in-box example that I find somehow too artificial to suit its intended purpose. The chapter still covers advanced notions like penalised likelihood and computational approximations (with a few words about MCMC, processed later in the book). Chapter 3 already considers simulation and posterior predictive use for model checking, with some cautionary words about point estimation and the dependence on loss functions.
“People commonly ask what the correct prior is for a given analysis [which] implies that for any given set of data there is a uniquely correct prior that must be used, or else the analysis will be invalid. This is a mistake.” (p.95)
Chapters 4 and 5 are concerned with normal univariate and multivariate linear regression. With some insistence on diagnostic plots. And no algebra whatsoever. Which is amazing (and a wee bit worrying) when considering the insistence on notions like multicolinearity found in Chapter 5. Chapter 6 addresses the issues of overfitting, regularisation and information criteria (AIC, BIC, WAIC). Once again, one can spot a Gelmanesque filiation there (if only because no other book that I know of covers WAIC). First mention there of deviance and entropy, while Maxent priors have to wait till Chapter 9. In order to cover model averaging with as little formalism as possible, the book replaces posterior probabilities of models with normalised WAIC transforms. Chapter 7 extends linear regression to interactions, albeit with mostly discussed examples rather than a general perspective.
“Gibbs sampling is a variant of the Metropolis-Hastings algorithm that uses clever proposals and is therefore more efficient [i.e.] you can get a good estimate of the posterior from Gibbs sampling with many fewer samples than a comparable Metropolis approach.” (p.245)
Chapter 8 is the chapter on MCMC algorithms, starting with a little tale on King Markov visiting islands in proportion to the number of inhabitants on each island. With no justification as to why those Markov methods are proper simulation methods. Or even some details about Gibbs samplers using exact conditionals. This makes the above remark the more worrying as it is false in general. Or at least meaningless without provisions. But this is a minor issue as the author quickly moves to Hamiltonian Monte Carlo and Stan, that he adopts as the default approach. Still with no explanation whatsoever on the nature of the algorithm or even the definition of Hamiltonians. Stan is thus to be taken by the reader as a blackbox returning Markov chains with hopefully the right [stationary] distribution. (Another Gelmanism on p.256 with the vignette “Warmup is not burn-in”.)
“And with no false modesty my intuition is no better. But I have learned to solve these problems by cold, hard, ruthless application of conditional probability. There’s no need to be clever when you can be ruthless.” (p.423)
Maximum entropy priors are introduced in Chapter 9 with the argument that those are the least informative priors (p.267) since they maximise the entropy. Sweeping under the carpet the dependence on (i) the dominating measure behind the entropy and (ii) the impact of the parameterisation of the constraints. (A nice vignette on the false god of “histomancy, the ancient art of divining likelihood functions from empirical histograms”, p.282.) This chapter and the following ones concentrate on generalised linear models. With the intermede in Chapter 11 of “Monsters and mixtures”! Monsters made of “parts of different creatures” (p.331). Mixtures in the sense of ordinal data and of zero-inflated and over-dispersed models, rather than in Gaussian mixture models. Maybe because Stan cannot handle discrete missing variables. (Chapter 14 deals with continuous missing data, which is handled by Bayesian imputation, i.e., by treating the missing data as extra parameters.) This part of the book ends up with Gaussian processes in Chapter 13, which can be introduced as spatial constraints on a covariance matrix in the appropriate GLM.
“It is hard to find an accessible introduction to image analysis, because it is a very computational subject. At the intermediate level, see Martin and Robert (2007), Chapter 8. You can hum over their mathematics and still acquaint yourself with the different goals and procedures.” (p.447)
“…mathematical foundations solve few, if any, of the contingent problems that we confront in the context of a study.” (p.443)
Hardly any maths is to be found in this book, including posterior derivations. Which is unsurprising given the declared intent of the author. And the use of Stan. And the above quotes, the second one being the last sentence of the book. Most derivations and prior modellings are hidden in the R or Stan code. As should be obvious from, e.g., our own Bayesian Essentials with R, this is not an approach I am quite comfortable with, simply because I feel that some level of abstraction helps better in providing a general guidance than an extensive array of examples. However, despite or because of this different perspective, Statistical Rethinking remains an impressive book that I do not hesitate recommending for prospective data analysts and applied statisticians!