“We estimated that about 40,000 human beings have died trying to enter the European Union, during about 5500 tragic attempts, in the period between January 1993 and March 2019.”
A paper by Alessio Farcomeni published last year in Annals of Applied Statistics [that is delivered to me by regular mail, but which I missed] about estimating the number of deaths on migration routes. I have been interested in this question for moral and statistical reasons since the beginning of the Syrian civil war and the induced massive increase in the number of people attempting to cross the Mediterranean Sea to reach Europe. Unfortunately, despite different attempts to contact governmental agencies (like the French Minister of the Interior), NGOs (like Amnesty), friends, academics (incl. a Dutch group on that very topic but with no data or data scientist), and connections (with the Italian Navy, the Tunisian government and Frontex), I could not access more than newspaper level data or highly local data that did not allow for a general picture.
“The law of large numbers guarantees consistency as long as the population size estimator is consistent and the model is well-specified.”
As it was my intention, the paper uses capture-recapture. The data is obtained from UNITED for Intercultural Action. It had recorded 4333 attempts to enter Europe that had at least one death occurring, between January 1993 and March 2019, as reported by one or several sources, like newspaper articles. Sources that are most often not independent, but rather copying one another, which is obviously problematic for the extrapolation to the unreported cases and for resorting to capture-recapture estimation. And the number of deaths per event is itself most likely inexact, since casualties are not always recovered or identified as such. Since migration routes, migrant flows, and smuggler policies keep massively changing, the homogeneity of the observations over nearly 30 years is low to inexistent, which makes invoking consistency rather inappropriate. This is also the reason why I find the approach followed by the paper too strongly model-based as for instance when relying on an Horvitz-Thompson estimator or using a GLM to link the number of deaths in one crossing and the number of sources reporting the tragedy. This fundamental difficulty in modelling or inferring from such unreliable and untrustworthy data sources and the absence of record linkage with other datasets like the entries in the border countries (e.g., Turkey) or the number of prevented crossings by the local coastguards alas make the final estimate of 40,000 deaths at sea close to impossible to calibrate from a model-free perspective. The actual figure is not only higher, but maybe considerably so. Unfortunately, the datasets that would allow linkage and recapture are unavailable or inexistent in departure countries, while arrival countries most abstain from storing data about the migrant flows and histories…