Archive for societal statistics

Hippocratic oath for maths?

Posted in Statistics with tags , , , , , , , , , , , , on August 23, 2019 by xi'an

On a free day in Nachi-Taksuura, I came across this call for a professional oath for mathematicians (and computer engineers and scientists in related fields). By UCL mathematician Hannah Fry. The theme is the same as with Weapons of math destruction, namely that algorithms have a potentially huge impact on everyone’s life and that those who design these algorithms should be accountable for it. And aware of the consequences when used by non-specialists. As illustrated by preventive justice software. And child abuse prediction software. Some form of ethics course should indeed appear in data science programs, for at least pointing out the limitations of automated decision making. However, I remain skeptical of the idea as (a) taking an oath does not mean an impossibility to breaking that oath, especially when one is blissfully unaware of breaking it (b) acting as ethically as possible should be part of everyone’s job, whether when designing deep learning algorithms or making soba noodles (c) the Hippocratic oath is mostly a moral statement that varies from place to place and from an epoch to the next (as, e.g., with respect to abortion which was prohibited in Hippocrates’ version) and does not prevent some doctors from engaging into unsavory activities. Or getting influenced by dug companies. And such an oath would not force companies to open-source their code, which in my opinion is a better way towards the assessment of such algorithms. The article does not mention either the Montréal Déclaration for a responsible AI, which goes further than a generic and most likely ineffective oath.

impossible estimation

Posted in Books, Statistics with tags , , , , , , , , , , , on January 29, 2018 by xi'an

Outside its Sciences & Médecine section that I most often read, Le Monde published last weekend a tribune by the anthropologist Michel Naepels [who kindly replied to my email on his column] on the impossibility to evaluate the number of deaths in Congo due to the political instability (a weak and undemocratic state fighting armed rebel groups), for lack of a reliable sample. With a huge gap between two estimations of this number, from 200,000 to 5.4 million excess deaths. In the later, IRC states that “only 0.4 percent of all deaths across DR Congo were attributed directly to violence”. Still, diverging estimates do not mean numbers are impossible to produce, just that more elaborate methods like those developed by Rebecca Steorts for Syrian deaths must be investigated. Which requires more means than those available to the local States (assuming they are interested in the answer) or to NGOs. This also raises the question whether or not “excess deaths” has an absolute meaning, since it refers to an hypothetical state of the world that has not taken place.

On the same page, another article written by geographers shed doubt on predictive policing software, not used in France, if not so clearly as in the Significance article by Kristian Lum and William Isaac last year.

algorithm for predicting when kids are in danger [guest post]

Posted in Books, Kids, Statistics with tags , , , , , , , , , , , , , , , , , on January 23, 2018 by xi'an

[Last week, I read this article in The New York Times about child abuse prediction software and approached Kristian Lum, of HRDAG, for her opinion on the approach, possibly for a guest post which she kindly and quickly provided!]

A week or so ago, an article about the use of statistical models to predict child abuse was published in the New York Times. The article recounts a heart-breaking story of two young boys who died in a fire due to parental neglect. Despite the fact that social services had received “numerous calls” to report the family, human screeners had not regarded the reports as meeting the criteria to warrant a full investigation. Offered as a solution to imperfect and potentially biased human screeners is the use of computer models that compile data from a variety of sources (jails, alcohol and drug treatment centers, etc.) to output a predicted risk score. The implication here is that had the human screeners had access to such technology, the software might issued a warning that the case was high risk and, based on this warning, the screener might have sent out investigators to intervene, thus saving the children.

These types of models bring up all sorts of interesting questions regarding fairness, equity, transparency, and accountability (which, by the way, are an exciting area of statistical research that I hope some readers here will take up!). For example, most risk assessment models that I have seen are just logistic regressions of [characteristics] on [indicator of undesirable outcome]. In this case, the outcome is likely an indicator of whether child abuse had been determined to take place in the home or not. This raises the issue of whether past determinations of abuse– which make up  the training data that is used to make the risk assessment tool–  are objective, or whether they encode systemic bias against certain groups that will be passed through the tool to result in systematically biased predictions. To quote the article, “All of the data on which the algorithm is based is biased. Black children are, relatively speaking, over-surveilled in our systems, and white children are under-surveilled.” And one need not look further than the same news outlet to find cases in which there have been egregiously unfair determinations of abuse, which disproportionately impact poor and minority communities.  Child abuse isn’t my immediate area of expertise, and so I can’t responsibly comment on whether these types of cases are prevalent enough that the bias they introduce will swamp the utility of the tool.

At the end of the day, we obviously want to prevent all instances of child abuse, and this tool seems to get a lot of things right in terms of transparency and responsible use. And according to the original article, it (at least on the surface) seems to be effective at more efficiently allocating scarce resources to investigate reports of child abuse. As these types of models become used more and more for a wider variety of prediction types, we need to be cognizant that (to quote my brilliant colleague, Josh Norkin) we don’t “lose sight of the fact that because this system is so broken all we are doing is finding new ways to sort our country’s poorest citizens. What we should be finding are new ways to lift people out of poverty.”