Archive for artificial intelligence

algorithm for predicting when kids are in danger [guest post]

Posted in Books, Kids, Statistics with tags , , , , , , , , , , , , , , , , , on January 23, 2018 by xi'an

[Last week, I read this article in The New York Times about child abuse prediction software and approached Kristian Lum, of HRDAG, for her opinion on the approach, possibly for a guest post which she kindly and quickly provided!]

A week or so ago, an article about the use of statistical models to predict child abuse was published in the New York Times. The article recounts a heart-breaking story of two young boys who died in a fire due to parental neglect. Despite the fact that social services had received “numerous calls” to report the family, human screeners had not regarded the reports as meeting the criteria to warrant a full investigation. Offered as a solution to imperfect and potentially biased human screeners is the use of computer models that compile data from a variety of sources (jails, alcohol and drug treatment centers, etc.) to output a predicted risk score. The implication here is that had the human screeners had access to such technology, the software might issued a warning that the case was high risk and, based on this warning, the screener might have sent out investigators to intervene, thus saving the children.

These types of models bring up all sorts of interesting questions regarding fairness, equity, transparency, and accountability (which, by the way, are an exciting area of statistical research that I hope some readers here will take up!). For example, most risk assessment models that I have seen are just logistic regressions of [characteristics] on [indicator of undesirable outcome]. In this case, the outcome is likely an indicator of whether child abuse had been determined to take place in the home or not. This raises the issue of whether past determinations of abuse– which make up  the training data that is used to make the risk assessment tool–  are objective, or whether they encode systemic bias against certain groups that will be passed through the tool to result in systematically biased predictions. To quote the article, “All of the data on which the algorithm is based is biased. Black children are, relatively speaking, over-surveilled in our systems, and white children are under-surveilled.” And one need not look further than the same news outlet to find cases in which there have been egregiously unfair determinations of abuse, which disproportionately impact poor and minority communities.  Child abuse isn’t my immediate area of expertise, and so I can’t responsibly comment on whether these types of cases are prevalent enough that the bias they introduce will swamp the utility of the tool.

At the end of the day, we obviously want to prevent all instances of child abuse, and this tool seems to get a lot of things right in terms of transparency and responsible use. And according to the original article, it (at least on the surface) seems to be effective at more efficiently allocating scarce resources to investigate reports of child abuse. As these types of models become used more and more for a wider variety of prediction types, we need to be cognizant that (to quote my brilliant colleague, Josh Norkin) we don’t “lose sight of the fact that because this system is so broken all we are doing is finding new ways to sort our country’s poorest citizens. What we should be finding are new ways to lift people out of poverty.”

minibatch acceptance for Metropolis-Hastings

Posted in Books, Statistics with tags , , , , , on January 12, 2018 by xi'an

An arXival that appeared last July by Seita, Pan, Chen, and Canny, and that relates to my current interest in speeding up MCMC. And to 2014 papers by  Korattikara et al., and Bardenet et al. Published in Uncertainty in AI by now. The authors claim that their method requires less data per iteration than earlier ones…

“Our test is applicable when the variance (over data samples) of the log probability ratio between the proposal and the current state is less than one.”

By test, the authors mean a mini-batch formulation of the Metropolis-Hastings acceptance ratio in the (special) setting of iid data. First they use Barker’s version of the acceptance probability instead of Metropolis’. Second, they use a Gaussian approximation to the distribution of the logarithm of the Metropolis ratio for the minibatch, while the Barker acceptance step corresponds to comparing a logistic perturbation of the logarithm of the Metropolis ratio against zero. Which amounts to compare the logarithm of the Metropolis ratio for the minibatch, perturbed by a logistic minus Normal variate. (The cancellation of the Normal in eqn (13) is a form of fiducial fallacy, where the Normal variate has two different meanings. In other words, the difference of two Normal variates is not equal to zero.) However, the next step escapes me as the authors seek to optimise the distribution of this logistic minus Normal variate. Which I thought was uniquely defined as such a difference. Another constraint is that the estimated variance of the log-likelihood ratio gets below one. (Why one?) The argument is that the average of the individual log-likelihoods is approximately Normal by virtue of the Central Limit Theorem. Even when randomised. While the illustrations on a Gaussian mixture and on a logistic regression demonstrate huge gains in computational time, it is unclear to me to which amount one can trust the approximation for a given model and sample size…

Children of Time [book review]

Posted in Books, pictures, Travel with tags , , , , , , , , , , on October 8, 2017 by xi'an

I came by this book in the common room of the mathematics department of the University of Warwick, which I visit regularly during my stays there, for it enjoys a book sharing box where I leave the books I’ve read (and do not want to carry back to Paris) and where I check for potential catches… One of these books was Tchaikovsky’s children of time, a great space-opera novel à la Arthur C Clarke, which got the 2016 Arthur C Clarke award, deservedly so (even though I very much enjoyed the long way to a small angry planet, Tchaikosky’s book is much more of an epic cliffhanger where the survival of an entire race is at stake). The children of time are indeed the last remnants of the human race, surviving in an artificial sleep aboard an ancient spaceship that irremediably deteriorates. Until there is no solution but landing on a terraformed planet created eons ago. And defended by an AI spanned (or spammed) by the scientist in charge of the terra-formation, who created a virus that speeds up evolution, with unintended consequences. Given that the strength of the book relies on these consequences, I cannot get into much details about the alternative pathway to technology (incl. artificial intelligence) followed by the inhabitants of this new world, and even less about the conclusive chapters that make up for a rather slow progression towards this final confrontation. An admirable and deep book I will most likely bring back to the common room on my next trip to Warwick! (As an aside I wonder if the title was chosen in connection with Goya’s picture of Chronus [Time] devouring his children…)

weapons of math destruction [fan]

Posted in Statistics with tags , , , , , , , , on September 20, 2017 by xi'an

As a [new] member of Parliement, Cédric Villani is now in charge of a committee on artificial intelligence, which goal is to assess the positive and negative sides of AI. And refers in Le Monde interview below to Weapons of Maths Destruction as impacting his views on the topic! Let us hope Superintelligence is no next on his reading list…

the incomprehensible challenge of poker

Posted in Statistics with tags , , , , , , , , on April 6, 2017 by xi'an

When reading in Nature about two deep learning algorithms winning at a version of poker within a few weeks of difference, I came back to my “usual” wonder about poker, as I cannot understand it as a game. (Although I can see the point, albeit dubious, in playing to win money.) And [definitely] correlatively do not understand the difficulty in building an AI that plays the game. [I know, I know nothing!]

career advices by Cédric Villani

Posted in Kids, pictures, Travel, University life with tags , , , , , , on January 26, 2017 by xi'an

Le Monde has launched a series of tribunes proposing career advices from 35 personalities, among whom this week (Jan. 4, 2017) Cédric Villani. His suggestion for younger generations is to invest in artificial intelligence and machine learning. While acknowledging this still is a research  topic, then switching to robotics [although this is mostly a separate. The most powerful advice in this interview is to start with a specialisation when aiming at a large spectrum of professional opportunities, gaining the opening from exchanges with people and places. And cultures. Concluding with a federalist statement I fully share.

weapons of math destruction [book review]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on December 15, 2016 by xi'an

wmd As I had read many comments and reviews about this book, including one by Arthur Charpentier, on Freakonometrics, I eventually decided to buy it from my Amazon Associate savings (!). With a strong a priori bias, I am afraid, gathered from reading some excerpts, comments, and the overall advertising about it. And also because the book reminded me of another quantic swan. Not to mention the title. After reading it, I am afraid I cannot tell my ascertainment has changed much.

“Models are opinions embedded in mathematics.” (p.21)

The core message of this book is that the use of algorithms and AI methods to evaluate and rank people is unsatisfactory and unfair. From predicting recidivism to fire high school teachers, from rejecting loan applications to enticing the most challenged categories to enlist for for-profit colleges. Which is indeed unsatisfactory and unfair. Just like using the h index and citation ranking for promotion or hiring. (The book mentions the controversial hiring of many adjunct faculty by KAU to boost its ranking.) But this conclusion is not enough of an argument to write a whole book. Or even to blame mathematics for the unfairness: as far as I can tell, mathematics has nothing to do with unfairness. Some analysts crunch numbers, produce a score, and then managers make poor decisions. The use of mathematics throughout the book is thus completely inappropriate, when the author means statistics, machine learning, data mining, predictive algorithms, neural networks, &tc. (OK, there is a small section on Operations Research on p.127, but I figure deep learning can bypass the maths.) Continue reading