## Le Monde lacks data scientists!

In a paper in Le Monde today, a journalist is quite critical of statistical analyses of voting behaviours regressed on socio-economic patterns. Warning that correlation is not causation and so on and so forth…But the analysis of the votes as presented in the article is itself quite appalling! Just judging from the above graph, where the vertical and horizontal axes are somewhat inverted (as predicting the proportion of over 65 in the population from their votes does not seem that relevant), with an incomprehensible drop in the over 65 proportion within a district between the votes for the fascist party and the other ones, both indicators of an inversion of the axes!, where the curves are apparently derived from four points [correction at the end explaining they used the whole data collection to draw the curve], where the variability in the curves is not opposed to the overall variability in the population, where more advanced tools than mere correlation are not broached upon, and so on… They should have asked Andrew. Or YouGov!

July 11, 2017 at 11:15 am

IMNSHO, “Le Monde” lacks much more than data scientists…

And, BTW, one may find most journalists (not only in “Le Monde”) somewhat wanting in mathematical and scientific training and (scientific) criticism abilities.

But in the present case, the difficulty is that the graph authors are trying to picture in two dimensions something that really needs (at least) three : the mapping of the three-way relationship ((district –> % of older voters), (district –> % of FN/LRM./FI votes)) into (% of older voters –> % of FN/LRM/FI votes) indeed allows them to obtain a (mediocre, not really bad) 2D picture, but silently introduces a *lot* of *unstated* postulates (among them, and most questionable, the absence of influence of any other district-specific characteristics of the population conditional on the district-specific age distribution).

It also lacks any indication (error bars ? shading ?) of the *variability* (or *incertitude*) of the plotted data.

In other words, this graph :

* gives a visual representation of an interesting relationship, but

* conveys an oversimplified representation of an oversimplified model.

This reminds me of what is said when comparing medical imaging (such as radiography) to swimsuits : “What they show is interesting, what they hide is essential”.

To decide if this hiding (oversimplification) is a byproduct of ignorance or malice is another question…