In a paper in Le Monde today, a journalist is quite critical of statistical analyses of voting behaviours regressed on socio-economic patterns. Warning that correlation is not causation and so on and so forth…But the analysis of the votes as presented in the article is itself quite appalling! Just judging from the above graph, where the vertical and horizontal axes are somewhat inverted (as predicting the proportion of over 65 in the population from their votes does not seem that relevant), with an incomprehensible drop in the over 65 proportion within a district between the votes for the fascist party and the other ones, both indicators of an inversion of the axes!, where the curves are apparently derived from four points [correction at the end explaining they used the whole data collection to draw the curve], where the variability in the curves is not opposed to the overall variability in the population, where more advanced tools than mere correlation are not broached upon, and so on… They should have asked Andrew. Or YouGov!

## Archive for political science

## Le Monde lacks data scientists!

Posted in Books, Statistics with tags Andrew Gelman, bad graph, data-analyst, French elections, Le Monde, political science, political statistics, YouGov on July 11, 2017 by xi'an## gerrymandering detection by MCMC

Posted in Books, Statistics with tags gerrymandering, Mumbai airport, Nature, North Carolina, political science, simulated annealing, temperature, US elections 2016 on June 16, 2017 by xi'an**I**n the latest issue of Nature I read (June 8), there is a rather long feature article on mathematical (and statistical) ways of measuring gerrymandering, that is the manipulation of the delimitations of a voting district toward improving the chances of a certain party. (The name comes from Elbridge Gerry (1812) and the salamander shape of the district he created.) The difficulty covered by the article is about detecting gerrymandering, which leads to the challenging and almost philosophical question of defining a “fair” partition of a region into voting districts, when those are not geographically induced. Since each partition does not break the principles of “one person, one vote” and of majority rule. Having a candidate or party win at the global level and loose at every local level seems to go against this majority rule, but with electoral systems like in the US, this frequently happens (with dire consequences in the latest elections). Just another illustration of Simpson’s paradox, essentially. And a damning drawback of multi-tiered electoral systems.

“In order to change the district boundaries, we use a Markov Chain Monte Carlo algorithm toproduce about 24,000 random but reasonable redistrictings.”

In the arXiv paper that led to this Nature article (along with other studies), Bagiat et al. essentially construct a tail probability to assess how extreme the current district partition is against a theoretical distribution of such partitions. Finding that the actual redistrictings of 2012 and 2016 in North Carolina are “extremely atypical”. (The generation of random partitions obeyed four rules, namely equal population, geographic compacity and connexity, proximity to county boundaries, and a majority of Afro-American voters in at least two districts, the latest being a requirement in North Carolina. A score function was built by linear combination of four corresponding scores, mostly χ² like, and turned into a density, simulated annealing style. The determination of the final temperature β=1 (p.18) [or equivalently of the weights (p.20)] remains unclear to me. As does the use of more than 10⁵ simulated annealing iterations to produce *a single partition* (p.18)…

From a broader perspective, agreeing on a method to produce random district allocations could be the way to go towards solving the judicial dilemma in setting new voting maps as what is currently under discussion in the US.