## we have never been unable to develop a reliable predictive model

Posted in Statistics with tags , , , , , , , , , , , , , , , on November 10, 2019 by xi'an

An alarming entry in The Guardian about the huge proportion of councils in the UK using machine-learning software to allocate benefits, detect child abuse or claim fraud. And relying blindly on the outcome of such software, despite their well-documented lack of reliability, uncertainty assessments, and warnings. Blindly in the sense that the impact of their (implemented) decision was not even reviewed, even though a portion of the councils does not consider renewing the contracts. With the appalling statement of the CEO of one software company reported in the title. Blaming further the lack of accessibility [for their company] of the data used by the councils for the impossibility [for the company] of providing risk factors and identifying bias, in an unbelievable newspeak inversion… As pointed out by David Spiegelhalter in the article, the openness should go the other way, namely that the algorithms behind the suggestions (read decisions) should be available to understand why these decisions were made. (A whole series of Guardian articles relate to this as well, under the heading “Automating poverty”.)

## Statistics and Health Care Fraud & Measuring Crime [ASA book reviews]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , on May 7, 2019 by xi'an

From the recently started ASA books series on statistical reasoning in science and society (of which I already reviewed a sequel to The Lady tasting Tea), a short book, Statistics and Health Care Fraud, I read at the doctor while waiting for my appointment, with no chances of cheating! While making me realise that there is a significant amount of health care fraud in the US, of which I had never though of before (!), with possibly specific statistical features to the problem, besides the use of extreme value theory, I did not find me insight there on the techniques used to detect these frauds, besides the accumulation of Florida and Texas examples. As  such this is a very light introduction to the topic, whose intended audience of choice remains unclear to me. It is stopping short of making a case for statistics and modelling against more machine-learning options. And does not seem to mention false positives… That is, the inevitable occurrence of some doctors or hospitals being above the median costs! (A point I remember David Spiegelhalter making a long while ago, during a memorable French statistical meeting in Pau.) The book also illustrates the use of a free auditing software called Rat-stats for multistage sampling, which apparently does not go beyond selecting claims at random according to their amount. Without learning from past data. (I also wonder if the criminals can reduce the chances of being caught by using this software.)

A second book on the “same” topic!, Measuring Crime, I read, not waiting at the police station, but while flying to Venezia. As indicated by the title, this is about measuring crime, with a lot of emphasis on surveys and census and the potential measurement errors at different levels of surveying or censusing… Again very little on statistical methodology, apart from questioning the data, the mode of surveying, crossing different sources, and establishing the impact of the way questions are stated, but also little on bias and the impact of policing and preventing AIs, as discussed in Weapons of Math Destruction and in some of Kristin Lum’s papers.Except for the almost obligatory reference to Minority Report. The book also concludes on an history chapter centred at Edith Abbott setting the bases for serious crime data collection in the 1920’s.

[And the usual disclaimer applies, namely that this bicephalic review is likely to appear later in CHANCE, in my book reviews column.]

## the joy of stats [book review]

Posted in Books, pictures, University life with tags , , , , , , , , , , , , on April 8, 2019 by xi'an

David Spiegelhalter‘s latest book, The Art of Statistics: How to Learn from Data, has made it to Nature Book Review main entry this week. Under the title “the joy of stats”,  written by Evelyn Lamb, a freelance math and science writer from Salt Lake City, Utah. (I noticed that the book made it to Amazon #1 bestseller, albeit in the Craps category!, which I am unsure is completely adequate!, especially since the book is not yet for sale on the US branch of Amazon!, and further Amazon #1 in the Probability and Statistics category in the UK.) I have not read the book yet and here are a few excerpts from the review, quoted verbatim:

“The book is part of a trend in statistics education towards emphasizing conceptual understanding rather than computational fluency. Statistics software can now perform a battery of tests and crunch any measure from large data sets in the blink of an eye. Thus, being able to compute the standard deviation of a sample the long way is seen as less essential than understanding how to design and interpret scientific studies with a rigorous eye.”

“…a main takeaway from the book is a sense of circumspection about our confidence in what is known. As Spiegelhalter writes, the point of statistical science is to ease us through the stages of extrapolation from a controlled study to an understanding of the real world, `and finally, with due humility, be able to say what we can and cannot learn from data’. That humility can be lacking when statistics are used in debates about contentious issues such as the costs and benefits of cancer screening.

Posted in Statistics, University life with tags , , , , , , on November 4, 2018 by xi'an

## absurd graph [if relevant warning]

Posted in pictures, Statistics, Wines with tags , , , , , , , on August 28, 2018 by xi'an

A pretty silly graph opposing countries with an overwhelming majority of non-Muslims and countries with an overwhelming majority of Muslims in terms of alcohol consumption. Surprise, surprise! And not incorporating the average amount or anything useful… In a Guardian article reporting on a Lancet paper about the lack of health benefit from drinking even moderate amounts of alcohol. Although, as pointed out by David Spiegelhalter at the bottom of the article, an increased risk of 0.5% associated with one unit of alcohol a day [half a pint]  , as opposed to 7% for two units [a pint!], should not get occasional drinkers too worried:  “Come to think of it, there is no safe level of living, but nobody would recommend abstention.”

## extra glass of wine? 30mn, please…

Posted in pictures, Statistics, Wines with tags , , , , , on April 20, 2018 by xi'an

As I was reading The Guardian early today, I came across this entry on how an extra glass (17.5cl) glass of wine was equivalent to 30mn less of life (expectancy), above the recommended maximum of five glass a week. As explained by Prof of Risk David Spiegelhalter himself! The Lancet study behind this analysis stated that “early deaths rose when more than 100g per week, which is five to six glasses of wine or pints of beer, was consumed.” So be careful!!!

## double yolk priors [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , on March 14, 2018 by xi'an

[Here is an email I received from Subhadeep Mukhopadhyay, one of the authors of the paper I discussed yesterday.}
Thank for discussing our work. Let me clarify the technical point that you raised:
– The difference between Legj(u)_j and Tj=Legj(G(θ)). One is orthonormal polyn of L2[0,1] and the other one is L2[G]. The second one is poly of rank-transform G(θ).
– As you correctly pointed out there is a danger in directly approximating the ratio. We work on it after taking the quantile transform: evaluate the ratio at g⁻¹(θ), which is the d(u;G,F) over unit interval. Now, this new transformed function is a proper density.
-Thus the ratio now becomes d(G(θ)) which can be expended into (NOT in Leg-basis) in $T_j$, in eq (2.2), as it lives in the Hilbert space L2(G)
– For your last point on Step 2 of our algo, we can also use the simple integrate command.
-Unlike traditional prior-data conflict here we attempted to answer three questions in one-shot: (i) How compatible is the pre-selected g with the given data? (ii) In the event of a conflict, can we also inform the user on the nature of misfit–finer structure that was a priori unanticipated? (iii) Finally, we would like to provide a simple, yet formal guideline for upgrading (repairing) the starting g.
Hopefully, this will clear the air. But thanks for reading the paper so carefully. Appreciate it.