Archive for systemic bias

modelling protocol in Nature

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on August 19, 2020 by xi'an

A three-page commentary in a recent issue of Nature is a manifesto for responsible modelling, with among the numerous signatories, Deborah Mayo.  (And Phillip Stark as the only statistician I spotted.) The main theme is that the model is not the real thing, e.g., the map is not the territory. Which as such is hardly debatable. The point of the tribune is that, in the light of the pandemic crisis, a large portion of the general population has discovered that mathematical models were not the truth and that their predictions were to be taken with a few marshes of salt. Either because they were based on faulty or antiquated data, if any. Or because their approximation level was too high to return any reliable figure. A failure to understand the nature of mathematical models reminding me of the 2008 financial crisis and of the bemused question of Liz Windsor and of the muddled response of economists:

“Why did nobody notice it?”

“Your Majesty,” eminent economists replied, “the failure to foresee the timing, extent and severity of the crisis and to head it off, while it had many causes, was principally a failure of the collective imagination of many bright people, both in this country and internationally, to understand the risks to the system as a whole.”

“People got a bit lax … perhaps it is difficult to foresee”

The manifesto calls for open assumptions, sensitivity analysis, uncertainty quantification, wariness of overfitting and structural biases (what is the utility function?), and the inclusion of ignorance acknowledgement as an outcome of the model. Which again sounds completely sound if not necessarily helpful when facing interlocutors asking for point estimates. I also regret that the tribune gives hardly any room to statistics and the model checking tools it had developed, except in mentioning the p-hacking and the false feeling of certainty produced by a p-value. Plus a bizarre mention of a French movement of statactivistes of which I had not heard and which seems connected to a book published in French by three of the signatories.

Nature reflections on policing

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on June 24, 2020 by xi'an

limited shelf validity

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on December 11, 2019 by xi'an

A great article from Steve Stigler in the new, multi-scaled, and so exciting Harvard Data Science Review magisterially operated by Xiao-Li Meng, on the limitations of old datasets. Illustrated by three famous datasets used by three equally famous statisticians, Quetelet, Bortkiewicz, and Gosset. None of whom were fundamentally interested in the data for their own sake. First, Quetelet’s data was (wrongly) reconstructed and missed the opportunity to beat Galton at discovering correlation. Second, Bortkiewicz went looking (or even cherry-picking!) for these rare events in yearly tables of mortality minutely divided between causes such as military horse kicks. The third dataset is not Guinness‘, but a test between two sleeping pills, operated rather crudely over inmates from a psychiatric institution in Kalamazoo, with further mishandling by Gosset himself. Manipulations that turn the data into dead data, as Steve put it. (And illustrates with the above skull collection picture. As well as warning against attempts at resuscitating dead data into what could be called “zombie data”.)

“Successful resurrection is only slightly more common than in Christian theology.”

His global perspective on dead data is that they should stop being used before extending their (shelf) life, rather than turning into benchmarks recycled over and over as a proof of concept. If only (my two cents) because it leads to calibrate (and choose) methods doing well over these benchmarks. Another example that could have been added to the skulls above is the Galaxy Velocity Dataset that makes frequent appearances in works estimating Gaussian mixtures. Which Radford Neal signaled at the 2001 ICMS workshop on mixture estimation as an inappropriate use of the dataset since astrophysical arguments weighted against a mixture modelling.

“…the role of context in shaping data selection and form—context in temporal, political, and social as well as scientific terms—has been shown to be a powerful and interesting phenomenon.”

The potential for “dead-er” data (my neologism!) increases with the epoch in that the careful sleuth work Steve (and others) conducted about these historical datasets is absolutely impossible with the current massive data sets. Massive and proprietary. And presumably discarded once the associated neural net is designed and sold. Letting the burden of unmasking the potential (or highly probable?) biases to others. Most interestingly, this recoups a “comment” in Nature of 17 October by Sabina Leonelli on the transformation of data from a national treasure to a commodity which “ownership can confer and signal power”. But her call for openness and governance of research data seems as illusory as other attempts to sever the GAFAs from their extra-territorial privileges…

Nature tidbits

Posted in Books, University life with tags , , , , , , , , , , , on September 7, 2019 by xi'an

Before returning a few older issues of Nature to the coffee room of the maths department, a quick look brought out the few following items of interests, besides the great cover above:

  • France showing the biggest decline in overal output among the top 10 countries in the Nature Index Annual Tables.
  • A tribune again the EU’s Plan S, towards funding (private) publishers directly from public (research) money. Why continuing to support commercial journals one way or another?!
  • A short debate on geo-engineering towards climate control, with the dire warning that “little is known about the consequences” [which could be further damaging the chances of human survival on this planet].
  • Another call for the accountability of companies designing AI towards fairness and unbiasedness [provided all agree on the meaning of these terms]
  • A study that argues that the obesity epidemics is more prevalent in rural than urban areas due to a higher recourse to junk food in the former.
  • A data mining venture in India to mine [not read] 73 million computerised journal articles, which is not yet clearly legal as the publishers object to it. Although the EU (and the UK) have laws authorising mining for non-commercial goals. (And India has looser regulations wrt copyright.)

algorithm for predicting when kids are in danger [guest post]

Posted in Books, Kids, Statistics with tags , , , , , , , , , , , , , , , , , on January 23, 2018 by xi'an

[Last week, I read this article in The New York Times about child abuse prediction software and approached Kristian Lum, of HRDAG, for her opinion on the approach, possibly for a guest post which she kindly and quickly provided!]

A week or so ago, an article about the use of statistical models to predict child abuse was published in the New York Times. The article recounts a heart-breaking story of two young boys who died in a fire due to parental neglect. Despite the fact that social services had received “numerous calls” to report the family, human screeners had not regarded the reports as meeting the criteria to warrant a full investigation. Offered as a solution to imperfect and potentially biased human screeners is the use of computer models that compile data from a variety of sources (jails, alcohol and drug treatment centers, etc.) to output a predicted risk score. The implication here is that had the human screeners had access to such technology, the software might issued a warning that the case was high risk and, based on this warning, the screener might have sent out investigators to intervene, thus saving the children.

These types of models bring up all sorts of interesting questions regarding fairness, equity, transparency, and accountability (which, by the way, are an exciting area of statistical research that I hope some readers here will take up!). For example, most risk assessment models that I have seen are just logistic regressions of [characteristics] on [indicator of undesirable outcome]. In this case, the outcome is likely an indicator of whether child abuse had been determined to take place in the home or not. This raises the issue of whether past determinations of abuse– which make up  the training data that is used to make the risk assessment tool–  are objective, or whether they encode systemic bias against certain groups that will be passed through the tool to result in systematically biased predictions. To quote the article, “All of the data on which the algorithm is based is biased. Black children are, relatively speaking, over-surveilled in our systems, and white children are under-surveilled.” And one need not look further than the same news outlet to find cases in which there have been egregiously unfair determinations of abuse, which disproportionately impact poor and minority communities.  Child abuse isn’t my immediate area of expertise, and so I can’t responsibly comment on whether these types of cases are prevalent enough that the bias they introduce will swamp the utility of the tool.

At the end of the day, we obviously want to prevent all instances of child abuse, and this tool seems to get a lot of things right in terms of transparency and responsible use. And according to the original article, it (at least on the surface) seems to be effective at more efficiently allocating scarce resources to investigate reports of child abuse. As these types of models become used more and more for a wider variety of prediction types, we need to be cognizant that (to quote my brilliant colleague, Josh Norkin) we don’t “lose sight of the fact that because this system is so broken all we are doing is finding new ways to sort our country’s poorest citizens. What we should be finding are new ways to lift people out of poverty.”