Archive for Andrew Gelman

[The Art of] Regression and other stories

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on July 23, 2020 by xi'an

CoI: Andrew sent me this new book [scheduled for 23 July on amazon] of his with Jennifer Hill and Aki Vehtari. Which I read in my garden over a few sunny morns. And as Andrew and Aki are good friends on mine, this review is definitely subjective and biased! Hence to take with a spoonful of salt.

The “other stories’ in the title is a very nice touch. And a clever idea. As the construction of regression models comes as a story to tell, from gathering and checking the data, to choosing the model specifications, to analysing the output and setting the safety lines on its interpretation and usages. I added “The Art of” in my own title as the exercise sounds very much like an art and very little like a technical or even less mathematical practice. Even though the call to the resident stat_glm R function is ubiquitous.

The style itself is very story-like, very far from a mathematical statistics book as, e.g., C.R. Rao’s Linear Statistical Inference and Its Applications. Or his earlier Linear Models which I got while drafted in the Navy. While this makes the “Stories” part most relevant, I also wonder how I could teach from this book to my own undergrad students without acquiring first (myself) the massive expertise represented by the opinions and advice on what is correct and what is not in constructing and analysing linear and generalised linear models. In the sense that I would find justifying or explaining opinionated sentences an amathematical challenge. On the other hand, it would make for a great remote course material, leading the students through the many chapters and letting them experiment with the code provided therein, creating new datasets and checking modelling assumptions. The debate between Bayesian and likelihood solutions is quite muted, with a recommendation for weakly informative priors superseded by the call for exploring the impact of one’s assumption. (Although the horseshoe prior makes an appearance, p.209!) The chapter on math and probability is somewhat superfluous as I hardly fathom a reader entering this book without a certain amount of math and stats background. (While the book warns about over-trusting bootstrap outcomes, I find the description in the Simulation chapter a wee bit too vague.) The final chapters about causal inference are quite impressive in their coverage but clearly require a significant amount of investment from the reader to truly ingest these 110 pages.

“One thing that can be confusing in statistics is that similar analyses can be performed in different ways.” (p.121)

Unsurprisingly, the authors warn the reader about simplistic and unquestioning usages of linear models and software, with a particularly strong warning about significance. (Remember Abandon Statistical Significance?!) And keep (rightly) arguing about the importance of fake data comparisons (although this can be overly confident at times). Great Chapter 11 on assumptions, diagnostics and model evaluation. And terrific Appendix B on 10 pieces of advice for improving one’s regression model. Although there are two or three pages on the topic, at the very end, I would have also appreciated a more balanced and constructive coverage of machine learning as it remains a form of regression, which can be evaluated by simulation of fake data and assessed by X validation, hence quite within the range of the book.

The document reads quite well, even pleasantly once one is over the shock at the limited amount of math formulas!, my only grumble being a terrible handwritten graph for building copters(Figure 1.9) and the numerous and sometimes gigantic square root symbols throughout the book. At a more meaningful level, it may feel as somewhat US centric, at least given the large fraction of examples dedicated to US elections. (Even though restating the precise predictions made by decent models on the eve of the 2016 election is worthwhile.) The Oscar for the best section title goes to “Cockroaches and the zero-inflated negative binomial model” (p.248)! But overall this is a very modern, stats centred, engaging and careful book on the most common tool of statistical modelling! More stories to come maybe?!

Notre-Dame-de-Paris analysed by Andrew [not a book review]

Posted in Books, pictures, Travel with tags , , , , , , , , , on July 17, 2020 by xi'an

As reported in Le Monde, Alexander van Geen, Yuling Yao, Tyler Ellis, and Andrew Gelman wrote a paper analysing the impact of the destruction of Notre-Dame last year in terms of lead concentration in the ground. As 460 tons of lead from the roof melted overnight. Based on  100 samples of surface soil collected by one author (not Andrew!) from tree pits, parks, and other sites in all directions within 1 km of the cathedral. Here is a plain language summary of the findings.

“This study attempts to estimate the extent to which the population of Paris was exposed to lead as a result of the Notre‐Dame cathedral fire of April 15, 2019. The concern stems from the large quantity of lead that covered the cathedral, some of which was injected into the air by the fire for several hours. In order to evaluate how much lead rising from the fire was redeposited nearby, surface soil samples were collected in all directions within a 1 km radius of the cathedral. Elevated levels of lead observed downwind of the cathedral indicate that surface soil preserved the mark of lead fallout from the fire. Although the estimated amount of lead redeposited within 1 km corresponds to only a small fraction of the total covering the cathedral, it could have posed a health hazard to children located downwind for a limited amount of time. Environmental testing on a larger scale immediately after the fire could have provided a more timely assessment of the scale of the problem and resulted in more pointed advice to the surrounding population on how to limit exposure to the fallout of lead.”

The statistical modelling is one of a spatial pattern of the lead distribution, using a mean-zero Gaussian process prior. And of a discretisation of the neighbourhood of the cathedral into uniform 30×30 locations. Without any further input, the model identifies properly the direction of the wind on that fateful evening. And logically concludes to a higher exposure than measured weeks after the fire. (Minor quibbles: a bias in self-declared test toward “a more educated, wealthier segment of the population” is unlikely in the immediate neighbourhood of Notre-Dame where the average flat sells at 16,000 euros per m², and the LCPP (Laboratoire Central de la Préfecture de Police) is not affiliated with the City of Paris but the Ministry of the Interior.)

Expectation Propagation as a Way of Life on-line

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , , on March 18, 2020 by xi'an

After a rather extended shelf-life, our paper expectation propagation as a way of life: a framework for Bayesian inference on partitioned data which was started when Andrew visited Paris in… 2014!, and to which I only marginally contributed, has now appeared in JMLR! Which happens to be my very first paper in this journal.

7 years later…

Posted in Statistics with tags , , , , , , on February 20, 2020 by xi'an

perspectives on Deborah Mayo’s Statistics Wars

Posted in Statistics with tags , , , , on October 23, 2019 by xi'an

A few months ago, Andrew Gelman collated and commented the reviews of Deborah Mayo’s book by himself, Brian Haig, Christian Hennig, Art B. Owen, Robert Cousins, Stan Young, Corey Yanofsky, E.J. Wagenmakers, Ron Kenett, Daniel Lakeland, and myself. The collection did not make it through the review process of the Harvard Data Science Review! it is however available on-line for perusal…