## so long, U.K., and thanks for all the fish

Posted in Statistics with tags , , , , , , , , , , on December 31, 2020 by xi'an

## understanding elections through statistics [book review]

Posted in Books, Kids, R, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , , , , , on October 12, 2020 by xi'an

A book to read most urgently if hoping to take an informed decision by 03 November! Written by a political scientist cum statistician, Ole Forsberg. (If you were thinking of another political scientist cum statistician, he wrote red state blue state a while ago! And is currently forecasting the outcome of the November election for The Economist.)

“I believe [omitting educational level] was the main reason the [Brexit] polls were wrong.”

The first part of the book is about the statistical analysis of opinion polls (assuming their outcome is given, rather than designing them in the first place). And starting with the Scottish independence referendum of 2014. The first chapter covering the cartoon case of simple sampling from a population, with or without replacement, Bayes and non-Bayes. In somewhat too much detail imho given that this is an unrealistic description of poll outcomes. The second chapter expands to stratified sampling (with confusing title [Polling 399] and entry, since it discusses repeated polls that are not processed in said chapter). Mentioning the famous New York Times experiment where five groups of pollsters analysed the same data, making different decisions in adjusting the sample and identifying likely voters, and coming out with a range of five points in the percentage. Starting to get a wee bit more advanced when designing priors for the population proportions. But still studying a weighted average of the voting intentions for each category. Chapter three reaches the challenging task of combining polls, with a 2017 (South) Korea presidential election as an illustration, involving five polls. It includes a solution to handling older polls by proposing a simple linear regression against time. Chapter 4 sums up the challenges of real-life polling by examining the disastrous 2016 Brexit referendum in the UK. Exposing for instance the complicated biases resulting from polling by phone or on-line. The part that weights polling institutes according to quality does not provide any quantitative detail. (And also a weird averaging between the levels of “support for Brexit” and “maybe-support for Brexit”, see Fig. 4.5!) Concluding as quoted above that missing the educational stratification was the cause for missing the shock wave of referendum day is a possible explanation, but the massive difference in turnover between the age groups, itself possibly induced by the reassuring figures of the published polls and predictions, certainly played a role in missing the (terrible) outcome.

“The fabricated results conformed to Benford’s law on first digits, but failed to obey Benford’s law on second digits.” Wikipedia

The second part of this 200 page book is about election analysis, towards testing for fraud. Hence involving the ubiquitous Benford law. Although applied to the leading digit which I do not think should necessarily follow Benford law due to both the varying sizes and the non-uniform political inclinations of the voting districts (of which there are 39 for the 2009 presidential Afghan election illustration, although the book sticks at 34 (p.106)). My impression was that instead lesser digits should be tested. Chapter 4 actually supports the use of the generalised Benford distribution that accounts for differences in turnouts between the electoral districts. But it cannot come up with a real-life election where the B test points out a discrepancy (and hence a potential fraud). Concluding with the author’s doubt [repeated from his PhD thesis] that these Benford tests “are specious at best”, which makes me wonder why spending 20 pages on the topic. The following chapter thus considers other methods, checking for differential [i.e., not-at-random] invalidation by linear and generalised linear regression on the supporting rate in the district. Once again concluding at no evidence of such fraud when analysing the 2010 Côte d’Ivoire elections (that led to civil war). With an extension in Chapter 7 to an account for spatial correlation. The book concludes with an analysis of the Sri Lankan presidential elections between 1994 and 2019, with conclusions of significant differential invalidation in almost every election (even those not including Tamil provinces from the North).

R code is provided and discussed within the text. Some simple mathematical derivations are found, albeit with a huge dose of warnings (“math-heavy”, “harsh beauty”) and excuses (“feel free to skim”, “the math is entirely optional”). Often, one wonders at the relevance of said derivations for the intended audience and the overall purpose of the book. Nonetheless, it provides an interesting entry on (relatively simple) models applied to election data and could certainly be used as an original textbook on modelling aggregated count data, in particular as it should spark the interest of (some) students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

## additional deaths in England & Wales

Posted in Statistics with tags , , , , , , , , , , on August 5, 2020 by xi'an

Source: United Kingdom Office for National Statistics

David Spiegelhalter wrote another piece for The Guardian about the number of COVID-related deaths in Britain, remarking that with the objective statistic of excess death, the kingdom is doing worse than any other country, including Belgium which is reported as the worst performer in the fight again the pandemic based on its reported COVID death numbers. David also shows the proper degree of caution in providing reasons for this terrible record rather than starting the blame game. One factor differentiating England from other countries like Italy being the spread of its COVID clusters, partly due to the higher mobility of the population, in particular its travelling for vacations. (The comparison also reveals a stable higher level of overall mortality in the UK when compared with south-west EU countries, except Portugal. It surprisingly misses Germany, which is unlikely to be a country with missing statistics!)

## [Nature on] simulations driving the world’s response to COVID-19

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on April 30, 2020 by xi'an

Nature of 02 April 2020 has a special section on simulation methods used to assess and predict the pandemic evolution. Calling for caution as the models used therein, like the standard ODE S(E)IR models, which rely on assumptions on the spread of the data and very rarely on data, especially in the early stages of the pandemic. One epidemiologist is quote stating “We’re building simplified representations of reality” but this is not dire enough, as “simplified” evokes “less precise” rather than “possibly grossly misleading”. (The graph above is unrelated to the Nature cover and appears to me as particularly appalling in mixing different types of data, time-scale, population at risk, discontinuous updates, and essentially returning no information whatsoever.)

“[the model] requires information that can be only loosely estimated at the start of an epidemic, such as the proportion of infected people who die, and the basic reproduction number (…) rough estimates by epidemiologists who tried to piece together the virus’s basic properties from incomplete information in different countries during the pandemic’s early stages. Some parameters, meanwhile, must be entirely assumed.”

The report mentions that the team at Imperial College, which predictions impacted the UK Government decisions, also used an agent-based model, with more variability or stochasticity in individual actions, which require even more assumptions or much more refined, representative, and trustworthy data.

“Unfortunately, during a pandemic it is hard to get data — such as on infection rates — against which to judge a model’s projections.”

Unfortunately, the paper was written in the early days of the rise of cases in the UK, which means predictions were not much opposed to actual numbers of deaths and hospitalisations. The following quote shows how far off they can fall from reality:

“the British response, Ferguson said on 25 March, makes him “reasonably confident” that total deaths in the United Kingdom will be held below 20,000.”

since the total number as of April 29 is above 21,000 24,000 29,750 and showing no sign of quickly slowing down… A quite useful general public article, nonetheless.

## coronavirus also hits reproduction rights!

Posted in Kids with tags , , , , , , , , , , , on March 27, 2020 by xi'an

“The [UK] Department of Health says reported changes to the abortion law, that would allow women to take both pills at home during the coronavirus outbreak, are not going ahead.” Independent, 23 March

“Texas and Ohio have included abortions among the nonessential surgeries and medical procedures that they are requiring to be delayed, saying they are trying to preserve precious protective equipment for health care workers and to make space for a potential flood of coronavirus patients.” The New York Times, 23 March

“Le ministre de la Santé, Olivier Véran, et la secrétaire d’Etat chargée de l’Égalité femmes-hommes, Marlène Schiappa, ont tenté lundi de rassurer : les IVG « sont considérées comme des interventions urgentes », et leur « continuité doit être assurée ». Le gouvernement veillera à ce que « le droit des femmes à disposer de leur corps ne soit pas remis en cause », ont-ils assuré.” Le Parisien, 23 March

“Lawmakers voted on Wednesday to liberalize New Zealand’s abortion law and allow unrestricted access during the first half of pregnancy, ending the country’s status as one of the few wealthy nations to limit the grounds for abortion during that period.” The New York Times, 18 March