## Statistical rethinking [book review]

Statistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!

“Make no mistake: you will wreck Prague eventually.” (p.10)

While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book.

“We don’t use the command line because we are hardcore or elitist (although we might be). We use the command line because it is better. It is harder at first (…) the ethical and cost saving advantages are worth the inconvenience.” (p.xv)

While trying not to shoot myself in the foot (!), I must acknowledge that the book also shares some common goal and coverage with our own Bayesian Essentials with R (and earlier Bayesian Core) in that it introduces Bayesian thinking and critical modelling through specific problems and spelled out R codes, if not dedicated datasets. Statistical Rethinking manages this all-inclusive most nicely and I would say somehow more smoothly than in Bayesian Essentials, also reaching further in terms of modelling (thanks to its 450 more pages). Not unlike Bayesian Core, McElreath’s style also incorporates vignettes for more advanced issues, called *Rethinking*, and R tricks and examples, called *Overthinking*.

“A common notion about Bayesian data analysis (…) is that it is distinguished by the use of Bayes’ theorem. This is a mistake.” (p.37)

Chapter 2 mentions Borges’ Garden of Forking Paths in a typical Gelmanesque tradition (Borges who also wrote a poem on the golem). It is however illustrated by a ball-in-box example that I find somehow too artificial to suit its intended purpose. The chapter still covers advanced notions like penalised likelihood and computational approximations (with a few words about MCMC, processed later in the book). Chapter 3 already considers simulation and posterior predictive use for model checking, with some cautionary words about point estimation and the dependence on loss functions.

“People commonly ask what the correct prior is for a given analysis [which] implies that for any given set of data there is a uniquely correct prior that must be used, or else the analysis will be invalid. This is a mistake.” (p.95)

Chapters 4 and 5 are concerned with normal univariate and multivariate linear regression. With some insistence on diagnostic plots. And no algebra whatsoever. Which is amazing (and a wee bit worrying) when considering the insistence on notions like multicolinearity found in Chapter 5. Chapter 6 addresses the issues of overfitting, regularisation and information criteria (AIC, BIC, WAIC). Once again, one can spot a Gelmanesque filiation there (if only because no other book that I know of covers WAIC). First mention there of deviance and entropy, while Maxent priors have to wait till Chapter 9. In order to cover model averaging with as little formalism as possible, the book replaces posterior probabilities of models with normalised WAIC transforms. Chapter 7 extends linear regression to interactions, albeit with mostly discussed examples rather than a general perspective.

“Gibbs sampling is a variant of the Metropolis-Hastings algorithm that uses clever proposals and is therefore more efficient [i.e.] you can get a good estimate of the posterior from Gibbs sampling with many fewer samples than a comparable Metropolis approach.” (p.245)

Chapter 8 is *the* chapter on MCMC algorithms, starting with a little tale on King Markov visiting islands in proportion to the number of inhabitants on each island. With no justification as to why those Markov methods are proper simulation methods. Or even some details about Gibbs samplers using exact conditionals. This makes the above remark the more worrying as it is false in general. Or at least meaningless without provisions. But this is a minor issue as the author quickly moves to Hamiltonian Monte Carlo and Stan, that he adopts as the default approach. Still with no explanation whatsoever on the nature of the algorithm or even the definition of Hamiltonians. Stan is thus to be taken by the reader as a blackbox returning Markov chains with hopefully the right [stationary] distribution. (Another Gelmanism on p.256 with the vignette “Warmup is not burn-in”.)

“And with no false modesty my intuition is no better. But I have learned to solve these problems by cold, hard, ruthless application of conditional probability. There’s no need to be clever when you can be ruthless.” (p.423)

Maximum entropy priors are introduced in Chapter 9 with the argument that those are the least informative priors (p.267) since they maximise the entropy. Sweeping under the carpet the dependence on (i) the dominating measure behind the entropy and (ii) the impact of the parameterisation of the constraints. (A nice vignette on the false god of “histomancy, the ancient art of divining likelihood functions from empirical histograms”, p.282.) This chapter and the following ones concentrate on generalised linear models. With the intermede in Chapter 11 of “Monsters and mixtures”! Monsters made of “parts of different creatures” (p.331). Mixtures in the sense of ordinal data and of zero-inflated and over-dispersed models, rather than in Gaussian mixture models. Maybe because Stan cannot handle discrete missing variables. (Chapter 14 deals with continuous missing data, which is handled by Bayesian imputation, i.e., by treating the missing data as extra parameters.) This part of the book ends up with Gaussian processes in Chapter 13, which can be introduced as spatial constraints on a covariance matrix in the appropriate GLM.

“It is hard to find an accessible introduction to image analysis, because it is a very computational subject. At the intermediate level, see Martin and Robert (2007), Chapter 8. You can hum over their mathematics and still acquaint yourself with the different goals and procedures.” (p.447)

“…mathematical foundations solve few, if any, of the contingent problems that we confront in the context of a study.” (p.443)

Hardly any maths is to be found in this book, including posterior derivations. Which is unsurprising given the declared intent of the author. And the use of Stan. And the above quotes, the second one being the last sentence of the book. Most derivations and prior modellings are hidden in the R or Stan code. As should be obvious from, e.g., our own Bayesian Essentials with R, this is not an approach I am quite comfortable with, simply because I feel that some level of abstraction helps better in providing a general guidance than an extensive array of examples. However, despite or because of this different perspective, Statistical Rethinking remains an impressive book that I do not hesitate recommending for prospective data analysts and applied statisticians!

February 27, 2017 at 3:33 pm

After reading the book I find this a very good review.

April 7, 2016 at 1:54 pm

Christian, I don’t think this book is intended for the audience your books target. Having attended your lectures and read your books, I can tell you for sure that no non-statistician I know (well, maybe one or two) would be able to follow your books. The presence of mathematical derivations in McElreath’s would have alienated the audience.

April 7, 2016 at 1:59 pm

Thanks, Shravan. I agree with you that non-statisticians would have a hard time with our books!! Now, it could also be hard for non-statisticians to gather enough from McElreath’s book to safely engage into active statistical practice beyond reproducing the available examples. But this remark presumably mostly reflects upon my formalised approach to everything in life. Or mostly everything.

April 7, 2016 at 8:04 pm

There is an intermediate type of person who can go partway from being a non-statistician to being able to do standard analyses competently. For that kind of person, McElreath’s book is going to be a great entry point. Such people would never read BDA3 or your books but can still get remarkably far without having to deal with most of the math. And a lot of people spend their entire lives doing the same type of analysis again and again and again; they just need to learn one trick, and that trick is hierarchical linear modeling.

April 7, 2016 at 8:23 pm

This is a strong feature of the book, this huge coverage of hierarchical models. Impressive, really!

April 12, 2016 at 8:16 pm

Thanks for your comments, Christian. I have second-guessed every choice in the book, so I appreciate reading you echo some of the issues that still worry me. I had a section on concentration of measure at one point, but test readers were utterly baffled. Hopefully readers will not hate me too much when they move on to other books and find that I’ve omitted such things.

April 12, 2016 at 10:01 pm

I have no worries about readers being unhappy with you! They will have built their Bayesian intuition on a nice collection of models and started wondering about the why as much as the how of Bayesian fundamentals, so will be well prepared for more theoretical books. If needed. Congrats!

April 7, 2016 at 2:34 am

[…] article was first published on R – Xi’an’s Og , and kindly contributed […]

April 6, 2016 at 1:24 pm

I really love this book, it’s really a joy to read and unlike most statistics books it is explicitly meant to be read as a narrative. I also think it fills a neglected niche in the market for statistics books that deal with teaching through doing rather than proof.

Don’t get me wrong, mathematics is required to do this task well, but a book like this can serve as a graceful entry into the field.

April 6, 2016 at 1:45 pm

I completely agree with the joy to read and the feeling of a unifying narrative!

April 6, 2016 at 4:47 am

Thanks for review. Book comes at a hefty price. Guess I’ll limp along with older and cheaper sources.

April 6, 2016 at 8:51 am

’tis true that the book currently costs $20-$30 more than comparable books like BDA. I never understood how publishers set their price.