Archive for credit scoring

no dichotomy between efficiency and interpretability

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 18, 2019 by xi'an

“…there are actually a lot of applications where people do not try to construct an interpretable model, because they might believe that for a complex data set, an interpretable model could not possibly be as accurate as a black box. Or perhaps they want to preserve the model as proprietary.”

One article I found quite interesting in the second issue of HDSR is “Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition” by Cynthia Rudin and Joanna Radin, which describes the setting of a NeurIPS competition last year, the Explainable Machine Learning Challenge, of which I was blissfully unaware. The goal was to construct an operational black box predictor fpr credit scoring and turn it into something interpretable. The authors explain how they built instead a white box predictor (my terms!), namely a linear model, which could not be improved more than marginally by a black box algorithm. (It appears from the references that these authors have a record of analysing black-box models in various setting and demonstrating that they do not always bring more efficiency than interpretable versions.) While this is but one example and even though the authors did not win the challenge (I am unclear why as I did not check the background story, writing on the plane to pre-NeuriPS 2019).

I find this column quite refreshing and worth disseminating, as it challenges the current creed that intractable functions with hundreds of parameters will always do better, if only because they are calibrated within the box and have eventually difficulties to fight over-fitting within (and hence under-fitting outside). This is also a difficulty with common statistical models, but having the ability to construct error evaluations that show how quickly the prediction efficiency deteriorates may prove the more structured and more sparsely parameterised models the winner (of real world competitions).

Numbers rule your world

Posted in Books, Statistics with tags , , , , , , , , , , , on February 22, 2010 by xi'an

Andrew Gelman gave me a copy of the recent book Numbers rule your world by Kaiser Fung, along with the comment that it was a nice book but not for us. I spend my “lazy Sunday” morning reading the book at the breakfast table and agree with Andrew on his assessment. (waiting for the  incoming blog review!). Numbers rule your world is unlikely to bring enlightment to professional or academic statisticians, but it provides a nice and soft introduction to the use of statistics in everyday’s life, to the point I would encourage my second and third year students to read it. It covers a few topics that are central to Statistics via ten newspaper-ised stories that make for a very light read, but nonetheless make the point. The themes in Numbers rule your world are

  • variability matters more than average, as illustrated by queuing phenomena;
  • correlation is not causation, but is often good enough to uncover patterns, as illustrated by epidemiology and credit scoring;
  • Simpson’s paradox explains for apparent bias in group differences, as illustrated by SAT score differences between black students and white students;
  • false positives and false negatives have different impacts on the error (here comes Bayes theorem!), depending on population sizes and settings, as illustrated by the (great!) case of cheating athletes and polygraph tests (with a reference to Steve Fienberg‘s work);
  • extreme events may exhibit causes, or not, as illustrated by a cheating lottery case (involving Jeff Rosenthal as the expert, not the cheater!) and a series of air crashes.

The overall tone of Numbers rule your world is pleasant and engaging, at the other end of the stylistic spectrum from Taleb’s Black Swan. Fung’s point is obviously the opposite of Taleb‘s: he is showing the reader how well statistical modelling can explain for apparently paradoxical behaviour. Fung is also adopting a very neutral tone, again a major change from Taleb, maybe being even too positive (no the only mention is made of the current housing crisis in the pages Numbers rule your world dedicates to credit scoring comes in the conclusion, pp. 176-7). Now, in terms of novelty, I cannot judge of the amount of innovation when compared with (numerous) other popular science books on the topic. For instance, I think Jeff Rosenthal’s Struck by Lightning brings a rather deeper perspective, but maybe thus restricts the readership further…