Archive for The Economist

understanding elections through statistics [book review]

Posted in Books, Kids, R, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , , , , , on October 12, 2020 by xi'an

A book to read most urgently if hoping to take an informed decision by 03 November! Written by a political scientist cum statistician, Ole Forsberg. (If you were thinking of another political scientist cum statistician, he wrote red state blue state a while ago! And is currently forecasting the outcome of the November election for The Economist.)

“I believe [omitting educational level] was the main reason the [Brexit] polls were wrong.”

The first part of the book is about the statistical analysis of opinion polls (assuming their outcome is given, rather than designing them in the first place). And starting with the Scottish independence referendum of 2014. The first chapter covering the cartoon case of simple sampling from a population, with or without replacement, Bayes and non-Bayes. In somewhat too much detail imho given that this is an unrealistic description of poll outcomes. The second chapter expands to stratified sampling (with confusing title [Polling 399] and entry, since it discusses repeated polls that are not processed in said chapter). Mentioning the famous New York Times experiment where five groups of pollsters analysed the same data, making different decisions in adjusting the sample and identifying likely voters, and coming out with a range of five points in the percentage. Starting to get a wee bit more advanced when designing priors for the population proportions. But still studying a weighted average of the voting intentions for each category. Chapter three reaches the challenging task of combining polls, with a 2017 (South) Korea presidential election as an illustration, involving five polls. It includes a solution to handling older polls by proposing a simple linear regression against time. Chapter 4 sums up the challenges of real-life polling by examining the disastrous 2016 Brexit referendum in the UK. Exposing for instance the complicated biases resulting from polling by phone or on-line. The part that weights polling institutes according to quality does not provide any quantitative detail. (And also a weird averaging between the levels of “support for Brexit” and “maybe-support for Brexit”, see Fig. 4.5!) Concluding as quoted above that missing the educational stratification was the cause for missing the shock wave of referendum day is a possible explanation, but the massive difference in turnover between the age groups, itself possibly induced by the reassuring figures of the published polls and predictions, certainly played a role in missing the (terrible) outcome.

“The fabricated results conformed to Benford’s law on first digits, but failed to obey Benford’s law on second digits.” Wikipedia

The second part of this 200 page book is about election analysis, towards testing for fraud. Hence involving the ubiquitous Benford law. Although applied to the leading digit which I do not think should necessarily follow Benford law due to both the varying sizes and the non-uniform political inclinations of the voting districts (of which there are 39 for the 2009 presidential Afghan election illustration, although the book sticks at 34 (p.106)). My impression was that instead lesser digits should be tested. Chapter 4 actually supports the use of the generalised Benford distribution that accounts for differences in turnouts between the electoral districts. But it cannot come up with a real-life election where the B test points out a discrepancy (and hence a potential fraud). Concluding with the author’s doubt [repeated from his PhD thesis] that these Benford tests “are specious at best”, which makes me wonder why spending 20 pages on the topic. The following chapter thus considers other methods, checking for differential [i.e., not-at-random] invalidation by linear and generalised linear regression on the supporting rate in the district. Once again concluding at no evidence of such fraud when analysing the 2010 Côte d’Ivoire elections (that led to civil war). With an extension in Chapter 7 to an account for spatial correlation. The book concludes with an analysis of the Sri Lankan presidential elections between 1994 and 2019, with conclusions of significant differential invalidation in almost every election (even those not including Tamil provinces from the North).

R code is provided and discussed within the text. Some simple mathematical derivations are found, albeit with a huge dose of warnings (“math-heavy”, “harsh beauty”) and excuses (“feel free to skim”, “the math is entirely optional”). Often, one wonders at the relevance of said derivations for the intended audience and the overall purpose of the book. Nonetheless, it provides an interesting entry on (relatively simple) models applied to election data and could certainly be used as an original textbook on modelling aggregated count data, in particular as it should spark the interest of (some) students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Statistics and Computing special MCMSk’issue [call for papers]

Posted in Books, Mountains, R, Statistics, University life with tags , , , , , , , , , , , on February 7, 2014 by xi'an

moonriseFollowing the exciting and innovative talks, posters and discussions at MCMski IV, the editor of Statistics and Computing, Mark Girolami (who also happens to be the new president-elect of the BayesComp section of ISBA, which is taking over the management of future MCMski meetings), kindly proposed to publish a special issue of the journal open to all participants to the meeting. Not only to speakers, mind, but to all participants.

So if you are interested in submitting a paper to this special issue of a computational statistics journal that is very close to our MCMski themes, I encourage you to do so. (Especially if you missed the COLT 2014 deadline!) The deadline for submissions is set on March 15 (a wee bit tight but we would dearly like to publish the issue in 2014, namely the same year as the meeting.) Submissions are to be made through the Statistics and Computing portal, with a mention that they are intended for the special issue.

An editorial committee chaired by Antonietta Mira and composed of Christophe Andrieu, Brad Carlin, Nicolas Chopin, Jukka Corander, Colin Fox, Nial Friel, Chris Holmes, Gareth Jones, Peter Müller, Antonietta Mira, Geoff Nicholls, Gareth Roberts, Håvård Rue, Robin Ryder, and myself, will examine the submissions and get back within a few weeks to the authors. In a spirit similar to the JRSS Read Paper procedure, submissions will first be examined collectively, before being sent to referees. We plan to publish the reviews as well, in order to include a global set of comments on the accepted papers. We intend to do it in The Economist style, i.e. as a set of edited anonymous comments. Usual instructions for Statistics and Computing apply, with the additional requirements that the paper should be around 10 pages and include at least one author who took part in MCMski IV.

statistical significance as explained by The Economist

Posted in Books, Statistics, University life with tags , , , , , , on November 7, 2013 by xi'an

There is a long article in The Economist of this week (also making the front cover), which discusses how and why many published research papers have unreproducible and most often “wrong” results. Nothing immensely new there, esp. if you read Andrew’s blog on a regular basis, but the (anonymous) writer(s) take(s) pains to explain how this related to statistics and in particular statistical testing of hypotheses. The above is an illustration from this introduction to statistical tests (and their interpretation).

“First, the statistics, which if perhaps off-putting are quite crucial.”

It is not the first time I spot a statistics backed article in this journal and so assume it has either journalists with a statistics background or links with (UK?) statisticians. The description of why statistical tests can err is fairly (Type I – Type II) classical. Incidentally, it reports a finding of Ioannidis that when reporting a positive at level 0.05,  the expectation of a false positive rate of one out of 20 is “highly optimistic”. An evaluation opposed to, e.g., Berger and Sellke (1987) who reported a too-early rejection in a large number of cases. More interestingly, the paper stresses that this classical approach ignores “the unlikeliness of the hypothesis being tested”, which I interpret as the prior probability of the hypothesis under test.

“Statisticians have ways to deal with such problems. But most scientists are not statisticians.”

The paper also reports about the lack of power in most studies, report that I find a bit bizarre and even meaningless in its ability to compute an overall power, all across studies and researchers and even fields. Even in a single study, the alternative to “no effect” is composite, hence has a power that depends on the unknown value of the parameter. Seeking a single value for the power requires some prior distribution on the alternative.

“Peer review’s multiple failings would matter less if science’s self-correction mechanism—replication—was in working order.”

The next part of the paper covers the failings of peer review, of which I discussed in the ISBA Bulletin, but it seems to me too easy to blame the ref in failing to spot statistical or experimental errors, when lacking access to the data or to the full experimental methodology and when under pressure to return (for free) a report within a short time window. The best that can be expected is that a referee detects the implausibility of a claim or an obvious methodological or statistical mistake. These are not math papers! And, as pointed out repeatedly, not all referees are statistically numerate….

“Budding scientists must be taught technical skills, including statistics.”

The last part discusses of possible solutions to achieve reproducibility and hence higher confidence in experimental results. Paying for independent replication is the proposed solution but it can obviously only apply to a small margin of all published results. And having control bodies testing at random labs and teams following a major publication seems rather unrealistic, if only for filling the teams of such bodies with able controllers… An interesting if pessimistic debate, in fine. And fit for the International Year of Statistics.


Posted in Books, Kids, Travel, University life with tags , , , , , , , , , , on April 9, 2012 by xi'an

Today, I was reading in the science leaflet of Le Monde about a new magnitude in sequencing cancerous tumors (wrong link, I know…). This made me wonder whether the sequence of (hundreds of) mutations leading from a normal cell to a cancerous one could be reconstituted in the way a genealogy is. (This reminds me of another exciting genetic article I read in the Eurostar back from London on Thursday, in the Economist, about the colonization of Madagascar by 30 women from the Malay archipelago: “The island was one of the last places on Earth to be settled, receiving its earliest migrants in the middle of the first millennium AD…“)

As a double coincidence, I was reading La Recherche yesterday in the métro to Dauphine, which central theme this month is about heredity beyond genetics. (Double because this also connected with the meeting in London.) The keyword is epigenetics, namely the activation or inactivation of a gene and the hereditary transmission of this character w/o a genetic mutation. This is quite interesting as it implies the hereditability of some adopted traits, i.e. forces one to reconsider the nature versus nurture debate. (This sentence is another input due to Galton!) It also implies that a much faster rate of species differentiation due to environmental changes (than the purely genetic one) is possible, which may sound promising in the light of the fast climate changes we are currently facing. However, what I do not understand is why the journal included a paper on the consequences of epigenetics on the Darwinian theory of evolution and… intelligent design. Indeed, I do not see why the inclusion of different vectors in the hereditary process would contradict Darwin’s notion of natural selection. Or even why considering a scientific modification or replacement of the current Darwinian theory of evolution would be an issue. Charles Darwin wrote his book in 1859, prior to the start of genetics, and the immense advances made since then led to modifications and adjustments from his original views. Without involving any irrational belief in the process.

High speed trains in Britain

Posted in Kids, Travel with tags , , , , , on September 18, 2011 by xi'an

I read an article in the Economist about (and against) high speed trains in Britain. It is eloquently entitled “the great train robbery” and in the tradition of the Economist, opposes this type of government interventions. In the current case, the issue is rather poorly argued! For instance, “the trend in France has been for headquarters to move up the line to Paris and for fewer overnight stays elsewhere”: I am afraid this trend started around Louis XIV’s time, the French TGV did not aggravate a strong Jacobin characteristic of French politics and sociology, the predominant role of Paris. On the other hand, the fast train connections to Marseille, Lilles, or Bordeaux means day trips are possible by train rather than plane. The article does not mention the Channel tunnel project, a state-funded venture if any, that made plane travel between Paris and London a thing from the past and twinned both cities by a two hour trip so much that going shopping from one place to the other sounds completely natural (to my kids if not to me!). Similarly, “China’s safety failures have shown the perils of skimping in any way” does not apply everywhere (while I agree that the precipitation China showed in building such an immense fast-speed network is not unrelated with the recent crash). Moreover, the idea that “upgrading existing, slower networks often makes more sense” is fine as long as companies are willing to invest in the long term. But the story of British railways shows the opposite, namely that companies are looking at short-term profits and balk at those long-term investments. Only states can provoke changes at this scale, so of course “ordinary taxpayers end up paying”. But they would pay in other ways for extending road networks or existing airports, or for maintaining isolated commercial hubs. While the Economist is admitting that “Victorian railways ushered in a golden age of prosperity”, I wonder how it could have supported railways constructions in the 1800’s!