Archive for The New York Times

understanding elections through statistics [book review]

Posted in Books, Kids, R, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , , , , , on October 12, 2020 by xi'an

A book to read most urgently if hoping to take an informed decision by 03 November! Written by a political scientist cum statistician, Ole Forsberg. (If you were thinking of another political scientist cum statistician, he wrote red state blue state a while ago! And is currently forecasting the outcome of the November election for The Economist.)

“I believe [omitting educational level] was the main reason the [Brexit] polls were wrong.”

The first part of the book is about the statistical analysis of opinion polls (assuming their outcome is given, rather than designing them in the first place). And starting with the Scottish independence referendum of 2014. The first chapter covering the cartoon case of simple sampling from a population, with or without replacement, Bayes and non-Bayes. In somewhat too much detail imho given that this is an unrealistic description of poll outcomes. The second chapter expands to stratified sampling (with confusing title [Polling 399] and entry, since it discusses repeated polls that are not processed in said chapter). Mentioning the famous New York Times experiment where five groups of pollsters analysed the same data, making different decisions in adjusting the sample and identifying likely voters, and coming out with a range of five points in the percentage. Starting to get a wee bit more advanced when designing priors for the population proportions. But still studying a weighted average of the voting intentions for each category. Chapter three reaches the challenging task of combining polls, with a 2017 (South) Korea presidential election as an illustration, involving five polls. It includes a solution to handling older polls by proposing a simple linear regression against time. Chapter 4 sums up the challenges of real-life polling by examining the disastrous 2016 Brexit referendum in the UK. Exposing for instance the complicated biases resulting from polling by phone or on-line. The part that weights polling institutes according to quality does not provide any quantitative detail. (And also a weird averaging between the levels of “support for Brexit” and “maybe-support for Brexit”, see Fig. 4.5!) Concluding as quoted above that missing the educational stratification was the cause for missing the shock wave of referendum day is a possible explanation, but the massive difference in turnover between the age groups, itself possibly induced by the reassuring figures of the published polls and predictions, certainly played a role in missing the (terrible) outcome.

“The fabricated results conformed to Benford’s law on first digits, but failed to obey Benford’s law on second digits.” Wikipedia

The second part of this 200 page book is about election analysis, towards testing for fraud. Hence involving the ubiquitous Benford law. Although applied to the leading digit which I do not think should necessarily follow Benford law due to both the varying sizes and the non-uniform political inclinations of the voting districts (of which there are 39 for the 2009 presidential Afghan election illustration, although the book sticks at 34 (p.106)). My impression was that instead lesser digits should be tested. Chapter 4 actually supports the use of the generalised Benford distribution that accounts for differences in turnouts between the electoral districts. But it cannot come up with a real-life election where the B test points out a discrepancy (and hence a potential fraud). Concluding with the author’s doubt [repeated from his PhD thesis] that these Benford tests “are specious at best”, which makes me wonder why spending 20 pages on the topic. The following chapter thus considers other methods, checking for differential [i.e., not-at-random] invalidation by linear and generalised linear regression on the supporting rate in the district. Once again concluding at no evidence of such fraud when analysing the 2010 Côte d’Ivoire elections (that led to civil war). With an extension in Chapter 7 to an account for spatial correlation. The book concludes with an analysis of the Sri Lankan presidential elections between 1994 and 2019, with conclusions of significant differential invalidation in almost every election (even those not including Tamil provinces from the North).

R code is provided and discussed within the text. Some simple mathematical derivations are found, albeit with a huge dose of warnings (“math-heavy”, “harsh beauty”) and excuses (“feel free to skim”, “the math is entirely optional”). Often, one wonders at the relevance of said derivations for the intended audience and the overall purpose of the book. Nonetheless, it provides an interesting entry on (relatively simple) models applied to election data and could certainly be used as an original textbook on modelling aggregated count data, in particular as it should spark the interest of (some) students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Bayes @ NYT

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , on August 8, 2020 by xi'an

A tribune in the NYT of yesterday on the importance of being Bayesian. When an epidemiologist. Tribune that was forwarded to me by a few friends (and which I missed on my addictive monitoring of the journal!). It is written by , a Canadian journalist writing about mathematics (and obviously statistics). And it brings to the general public the main motivation for adopting a Bayesian approach, namely its coherent handling of uncertainty and its ability to update in the face of new information. (Although it might be noted that other flavours of statistical analysis are also able to update their conclusions when given more data.) The COVID situation is a perfect case study in Bayesianism, in that there are so many levels of uncertainty and imprecision, from the models themselves, to the data, to the outcome of the tests, &tc. The article is journalisty, of course, but it quotes from a range of statisticians and epidemiologists, including Susan Holmes, whom I learned was quarantined 105 days in rural Portugal!, developing a hierarchical Bayes modelling of the prevalent  SEIR model, and David Spiegelhalter, discussing Cromwell’s Law (or better, humility law, for avoiding the reference to a fanatic and tyrannic Puritan who put Ireland to fire and the sword!, and had in fact very little humility for himself). Reading the comments is both hilarious (it does not take long to reach the point when Trump is mentioned, and Taleb’s stance on models and tails makes an appearance) and revealing, as many readers do not understand the meaning of Bayes’ inversion between causes and effects, or even the meaning of Jeffreys’ bar, |, as conditioning.


Posted in pictures, Running, Travel with tags , , , , , , , , , on May 5, 2020 by xi'an

on an absurd climbing competition

Posted in Kids, Mountains, pictures, Running, Travel with tags , , , , , , , , , , on April 1, 2020 by xi'an

The New York Times has a very interesting piece on why Adam Ondra, arguably the best sport climber in the World, who climbed the very first 9c route in 2018, with a supernatural move involving hanging head down, actually has little hope of winning the Olympics. Assuming there will be Olympics this year. It is essentially because there is only one single medal for the sport, merging the radically different skills of bouldering, lead climbing and the absurd addition of speed climbing, which involves a single route, always the same, not particularly hard (6b) but to be climbed as fast as possible. To be a top contender on two categories is already pretty rare, with Ondra an exception. To master all three… Only cumulated athletic categories like heptathlon or pentathlon compare, but they come on top of existing competitions for every single of the seven or five events they are made of. Ondra came second or first in bouldering and lead, but closer to last for speed climbing. At least he made it through the qualifications.

the exponential power of now

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 22, 2020 by xi'an

The New York Times had an interview on 13 March with Britta Jewell (MRC, Imperial College London) and Nick Jewell (London School of Hygiene and Tropical Medicine & U of C Berkeley), both epidemiologists. (Nick is also an AE for Biometrika.) Where they explain quite convincingly that the devastating power of the exponential growth and the resulting need for immediate reaction. An urgency that Western governments failed to heed, unsurprisingly including the US federal government. Maybe they should have been told afresh about the legend of paal paysam, where the king who lost to Krishna was asked to double rice grains on the successive squares of a chess board. (Although this is presumably too foreign a thought experiment for The agent orange. He presumably prefers the unbelievable ideological rantings of John Ioannides. Who apparently does mind sacrificing “people with limited life expectancies” for the sake of the economy.) Incidentally, I find the title “The exponential power of now” fabulous!