## [The Art of] Regression and other stories

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on July 23, 2020 by xi'an

CoI: Andrew sent me this new book [scheduled for 23 July on amazon] of his with Jennifer Hill and Aki Vehtari. Which I read in my garden over a few sunny morns. And as Andrew and Aki are good friends on mine, this review is definitely subjective and biased! Hence to take with a spoonful of salt.

The “other stories’ in the title is a very nice touch. And a clever idea. As the construction of regression models comes as a story to tell, from gathering and checking the data, to choosing the model specifications, to analysing the output and setting the safety lines on its interpretation and usages. I added “The Art of” in my own title as the exercise sounds very much like an art and very little like a technical or even less mathematical practice. Even though the call to the resident stat_glm R function is ubiquitous.

The style itself is very story-like, very far from a mathematical statistics book as, e.g., C.R. Rao’s Linear Statistical Inference and Its Applications. Or his earlier Linear Models which I got while drafted in the Navy. While this makes the “Stories” part most relevant, I also wonder how I could teach from this book to my own undergrad students without acquiring first (myself) the massive expertise represented by the opinions and advice on what is correct and what is not in constructing and analysing linear and generalised linear models. In the sense that I would find justifying or explaining opinionated sentences an amathematical challenge. On the other hand, it would make for a great remote course material, leading the students through the many chapters and letting them experiment with the code provided therein, creating new datasets and checking modelling assumptions. The debate between Bayesian and likelihood solutions is quite muted, with a recommendation for weakly informative priors superseded by the call for exploring the impact of one’s assumption. (Although the horseshoe prior makes an appearance, p.209!) The chapter on math and probability is somewhat superfluous as I hardly fathom a reader entering this book without a certain amount of math and stats background. (While the book warns about over-trusting bootstrap outcomes, I find the description in the Simulation chapter a wee bit too vague.) The final chapters about causal inference are quite impressive in their coverage but clearly require a significant amount of investment from the reader to truly ingest these 110 pages.

“One thing that can be confusing in statistics is that similar analyses can be performed in different ways.” (p.121)

Unsurprisingly, the authors warn the reader about simplistic and unquestioning usages of linear models and software, with a particularly strong warning about significance. (Remember Abandon Statistical Significance?!) And keep (rightly) arguing about the importance of fake data comparisons (although this can be overly confident at times). Great Chapter 11 on assumptions, diagnostics and model evaluation. And terrific Appendix B on 10 pieces of advice for improving one’s regression model. Although there are two or three pages on the topic, at the very end, I would have also appreciated a more balanced and constructive coverage of machine learning as it remains a form of regression, which can be evaluated by simulation of fake data and assessed by X validation, hence quite within the range of the book.

The document reads quite well, even pleasantly once one is over the shock at the limited amount of math formulas!, my only grumble being a terrible handwritten graph for building copters(Figure 1.9) and the numerous and sometimes gigantic square root symbols throughout the book. At a more meaningful level, it may feel as somewhat US centric, at least given the large fraction of examples dedicated to US elections. (Even though restating the precise predictions made by decent models on the eve of the 2016 election is worthwhile.) The Oscar for the best section title goes to “Cockroaches and the zero-inflated negative binomial model” (p.248)! But overall this is a very modern, stats centred, engaging and careful book on the most common tool of statistical modelling! More stories to come maybe?!

## science under attack [it only gets worse #1074]

Posted in Kids, pictures, Travel, University life with tags , , , , , , , , , , , on January 6, 2020 by xi'an

A chilling overview by the New York Times on the permanent and concerted attacks by the Trump administration on science and the scientific duties of the U.S. Government. [This post was written a week ago, before a much scarier and literal as well as extra-judicial attack took place.]

“Political appointees have shut down government studies, reduced the influence of scientists over regulatory decisions and in some cases pressured researchers not to speak publicly. The administration has particularly challenged scientific findings related to the environment and public health opposed by industries such as oil drilling and coal mining. It has also impeded research around human-caused climate change, which President Trump has dismissed despite a global scientific consensus.”

“The administration’s efforts to cut certain research projects also reflect a longstanding conservative position that some scientific work can be performed cost-effectively by the private sector, and taxpayers shouldn’t be asked to foot the bill.”

“…some of the Trump administration’s moves, like a policy to restrict certain academics from the E.P.A.’s Science Advisory Board or the proposal to limit the types of research that can be considered by environmental regulators, “mark a sharp departure with the past.” Rather than isolated battles between political officials and career experts, these moves are an attempt to legally constrain how federal agencies use science in the first place.”

“In addition to shutting down some programs, there have been notable instances where the administration has challenged established scientific research. Early on, as it started rolling back regulations on industry, administration officials began questioning research findings underpinning those regulations (…) Many top government positions, including at the E.P.A. and the Interior Department, are now occupied by former lobbyists connected to the industries that those agencies oversee.”

## a free press needs you [reposted]

Posted in Books, Kids with tags , , , , , , , on August 16, 2018 by xi'an

“Criticizing the news media — for underplaying or overplaying stories, for getting something wrong — is entirely right. News reporters and editors are human, and make mistakes. Correcting them is core to our job. But insisting that truths you don’t like are “fake news” is dangerous to the lifeblood of democracy. And calling journalists the “enemy of the people” is dangerous, period.”

## and it only gets worse…

Posted in Kids, pictures with tags , , , , , , , , , , , , , , , , , , , , on December 2, 2017 by xi'an

“You know, the saddest thing is that because I’m the president of the United States, I am not supposed to be involved with the Justice Department,” Mr. Trump said in a radio interview on Thursday on the “Larry O’Connor Show.” “I am not supposed to be involved with the F.B.I. I’m not supposed to be doing the kind of things that I would love to be doing. And I’m very frustrated by it.” NYT, Nov 03, 2017

“Two former US intelligence chiefs have said Donald Trump poses “a peril” to the US because he is vulnerable to being “played” by Russia, after the president said on Saturday he believed Vladimir Putin’s denials of Russian interference in the 2016 election.” The Guardian, Nov 12, 2017

“As a result [of the 44% of vacant seats in the appeal courts], Mr. Trump is poised to bring the conservative legal movement, which took shape in the 1980s in reaction to decades of liberal rulings on issues like the rights of criminal suspects and of women who want abortions, to a new peak of influence over American law and society.” NYT, Nov 11, 2017

“Hunting interests have scored a major victory with the Trump administration’s decision to allow Americans to bring home body parts of elephants shot for sport in Africa. Another totemic species now looks set to follow suit – lions.”  The Guardian, Nov 16, 2017

“Like everything else Trump touches, he hijacks it with his chronic dishonesty and childishness,” said Mark Salter, a longtime adviser to Senator John McCain, Republican of Arizona. “The intense, angry and largely ignorant tribalism afflicting our politics predates Trump’s arrival on the scene. But he has infused it with a psychopath’s inability to accept that social norms apply to him.” NYT, November  18, 2017

“We represent a much larger number of concerned mental health professionals who have come forward to warn against the president’s psychological instability and the dangers it poses. We now number in the thousands.” NYT, November 31, 2017

## gerrymandering detection by MCMC

Posted in Books, Statistics with tags , , , , , , , on June 16, 2017 by xi'an

In the latest issue of Nature I read (June 8), there is a rather long feature article on mathematical (and statistical) ways of measuring gerrymandering, that is the manipulation of the delimitations of a voting district toward improving the chances of a certain party. (The name comes from Elbridge Gerry (1812) and the salamander shape of the district he created.) The difficulty covered by the article is about detecting gerrymandering, which leads to the challenging and almost philosophical question of defining a “fair” partition of a region into voting districts, when those are not geographically induced. Since each partition does not break the principles of “one person, one vote” and of majority rule. Having a candidate or party win at the global level and loose at every local level seems to go against this majority rule, but with electoral systems like in the US, this frequently happens (with dire consequences in the latest elections). Just another illustration of Simpson’s paradox, essentially. And a damning drawback of multi-tiered electoral systems.

“In order to change the district boundaries, we use a Markov Chain Monte Carlo algorithm to produce about 24,000 random but reasonable redistrictings.”

In the arXiv paper that led to this Nature article (along with other studies), Bagiat et al. essentially construct a tail probability to assess how extreme the current district partition is against a theoretical distribution of such partitions. Finding that the actual redistrictings of 2012 and 2016 in North Carolina are “extremely atypical”.  (The generation of random partitions obeyed four rules, namely equal population, geographic compacity and connexity, proximity to county boundaries, and a majority of Afro-American voters in at least two districts, the latest being a requirement in North Carolina. A score function was built by linear combination of four corresponding scores, mostly χ² like, and turned into a density, simulated annealing style. The determination of the final temperature β=1 (p.18) [or equivalently of the weights (p.20)] remains unclear to me. As does the use of more than 10⁵ simulated annealing iterations to produce a single partition (p.18)…

From a broader perspective, agreeing on a method to produce random district allocations could be the way to go towards solving the judicial dilemma in setting new voting maps as what is currently under discussion in the US.