## The Effect [book review]

Posted in Books, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on March 10, 2023 by xi'an

While it sounds like the title of a science-fiction catastrophe novel or of a (of course) convoluted nouveau roman, this book by Nick Huntington-Klein is a massive initiation to econometrics and causality. As explained by the subtitle, An Introduction to Research Design and Causality.

This is a hüûüge book, actually made of two parts that could have been books (volumes?). And covering three langages, R, Stata, and Python, which should have led to three independent books. (Seriously, why print three versions when you need at best one?!)  I carried it with me during my vacations in Central Québec, but managed to loose my notes on the first part, which means missing the opportunity for biased quotes! It was mostly written during the COVID lockdown(s), which may explain for a certain amount of verbosity and rambling around.

“My mom loved the first part of the book and she is allergic to statistics.”

The first half (which is in fact a third!) is conceptual (and chatty) and almost formula free, based on the postulate that “it’s a pretty slim portion of students who understand a method because of an equation” (p.xxii). For this reader (or rather reviewer) and on explanations through example, it makes the reading much harder as spotting the main point gets harder (and requires reading most sentences!). And a very slow start since notations and mathematical notions have to be introduced with an excess of caution (as in the distinction between Latin and Greek symbols, p.36). Moving through single variable models, conditional distributions, with a lengthy explanation of how OLS are derived, data generating process and identification (of causes), causal diagrams, back and front doors (a recurrent notion within the book),  treatment effects and a conclusion chapter.

“Unlike statistical research, which is completely made of things that are at least slightly false, statistics itself is almost entirely true.” (p.327)

The second part, called the Toolbox, is closer to a classical introduction to econometrics, albeit with a shortage of mathematics (and no proof whatsoever), although [warning!] logarithms, polynomials, partial derivatives and matrices are used. Along with a consequent (3x) chunk allocated to printed codes, the density of the footnotes significantly increases in this section. It covers an extensive chapter on regression (including testing practice, non-linear and generalised linear models, as well as basic bootstrap without much warning about its use in… regression settings, and LASSO),  one on matching (with propensity scores, kernel weighting, Mahalanobis weighting, one on  simulation, yes simulation! in the sense of producing pseudo-data from known generating processes to check methods, as well as bootstrap (with resampling residuals making at last an appearance!), fixed and random effects (where the author “feels the presence of Andrew Gelman reaching through time and space to disagree”, p.405). The chapter on event studies is about time dependent data with a bit of ARIMA prediction (but nothing on non-stationary series and unit root issues). The more exotic chapters cover (18) difference-in-differences models (control vs treated groups, with John Snow pumping his way in), (19) instrumental variables (aka the minor bane of my 1980’s econometrics courses), with double least squares and generalised methods of moments (if not the simulated version), (20) discontinuity (i.e., changepoints), with the limitation of having a single variate explaining the change, rather than an unknown combination of them, and a rather pedestrian approach to the issue, (iv) other methods (including the first mention of machine learning regression/prediction and some causal forests), concluding with an “Under the rug” portmanteau.

Nothing (afaict) on multivariate regressed variates and simultaneous equations. Hardly an occurrence of Bayesian modelling (p.581), vague enough to remind me of my first course of statistics and the one-line annihilation of the notion.

Duh cover, but nice edition, except for the huge margins that could have been cut to reduce the 622 pages by a third (and harnessed the tendency of the author towards excessive footnotes!). And an unintentional white line on p.238! Cute and vaguely connected little drawings at the head of every chapter (like the head above). A rather terse matter index (except for the entry “The first reader to spot this wins ten bucks“!), which should have been completed with an acronym index.

“Calculus-heads will recognize all of this as taking integrals of the density curve. Did you know there’s calculus hidden inside statistics? The things your professor won’t tell you until it’s too late to drop the class.

Obviously I am biased in that I cannot negatively comment on an author running 5:37 a mile as, by now, I could just compete far from the 5:15 of yester decades! I am just a wee bit suspicious at the reported time, however, given that it happens exactly on page 537… (And I could have clearly taken issue with his 2014 paper, Is Robert anti-teacher? Or with the populist catering to anti-math attitudes as the above found in a footnote!) But I enjoyed reading the conceptual chapter on causality as well as the (more) technical chapter on instrumental variables (a notion I have consistently found confusing all the [long] way from graduate school). And while repeated references are made to Scott Cunningham’s Causal Inference: The Mixtape I think I will stop there with 500⁺ page introductory econometrics books!

[Disclaimer about potential self-plagiarism: this post or an edited version will potentially appear in my Books Review section in CHANCE.]

## statistics for making decisions [book review]

Posted in Statistics, Books with tags , , , , , , , , , , , , on March 7, 2022 by xi'an

I bought this book [or more precisely received it from CRC Press as a ({prospective} book) review reward] as I was interested in the author’s perspectives on actual decision making (and unaware of the earlier Statistical Decision Theory book he had written in 2013). It is intended for a postgraduate semester course and  “not for a beginner in statistics”. Exercises with solutions are included in each chapter (with some R codes in the solutions). From Chapter 4 onwards, the “Further reading suggestions” are primarily referring to papers and books written by the author, as these chapters are based on his earlier papers.

“I regard hypothesis testing as a distraction from and a barrier to good statistical practice. Its ritualised application should be resisted from the position of strength, by being well acquainted with all its theoretical and practical aspects. I very much hope (…) that the right place for hypothesis testing is in a museum, next to the steam engine.”

The first chapter exposes the shortcomings of hypothesis testing for conducting decision making, in particular by ignoring the consequences of the decisions. A perspective with which I agree, but I fear the subsequent developments found in the book remain too formalised to be appealing, reverting to the over-simplification found in Neyman-Pearson theory. The second chapter is somewhat superfluous for a book assuming a prior exposure to statistics, with a quick exposition of the frequentist, Bayesian, and … fiducial paradigms. With estimators being first defined without referring to a specific loss function. And I find the presentation of the fiducial approach rather shaky (if usual). Esp. when considering fiducial perspective to be used as default Bayes in the subsequent chapters. I also do not understand the notation (p.31)

$P(\hat\theta

outside of a Bayesian (or fiducial?) framework. (I did not spot typos aside from the traditional “the the” duplicates, with at least six occurences!)

The aforementioned subsequent chapters are not particularly enticing as they cater to artificial loss functions and engage into detailed derivations that do not seem essential. At times they appear to be nothing more than simple calculus exercises. The very construction of the loss function, which I deem critical to implement statistical decision theory, is mostly bypassed. The overall setting is also frighteningly unidimensional. In the parameter, in the statistic, and in the decision. Covariates only appear in the final chapter which appears to have very little connection with decision making in that the loss function there is the standard quadratic loss, used to achieve the optimal composition of estimators, rather than selecting the best model. The book is also missing in practical or realistic illustrations.

“With a bit of immodesty and a tinge of obsession, I would like to refer to the principal theme of this book as a paradigm, ascribing to it as much importance and distinction as to the frequentist and Bayesian paradigms”

The book concludes with a short postscript (pp.247-249) reproducing the introducing paragraphs about the ill-suited nature of hypothesis testing for decision-making. Which would have been better supported by a stronger engagement into elicitating loss functions and quantifying the consequences of actions from the clients…

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## Handbooks [not a book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on October 26, 2021 by xi'an

## handbook of mixture analysis [review]

Posted in Books, R, Statistics with tags , , , , , , , , , on March 19, 2021 by xi'an

“In my opinion, the editors have done an excellent job when selecting the contents of the handbook and putting the different chapters together. For instance, this can be appreciated by the fact that, despite the large number of authors and contributions, all chapters have kept the same notation. Furthermore, in addition to a sound description of the underlying theory and methods, several chapters include information about how to fit the presented models using the R programming language. However, I missed pointers to repositories to download the code and datasets for some of the examples used in the book. To sum up, this is an excellent reference book on mixture models.” Virgilio Gómez-Rubio, JRSS A, 2021

## an elegant book [review]

Posted in Books, Statistics, University life with tags , , , , , , , on December 28, 2020 by xi'an

“Handbook of Mixture Analysis is an elegant book on the mixture models. It covers not only statistical foundations but also extensions and applications of mixture models. The book consists of 19 chapters (each chapter is an independent paper), and collectively, these chapters weave into an elegant web of mixture models” Yen-Chi Chen (U. Washington)