Archive for HDSR

no dichotomy between efficiency and interpretability

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 18, 2019 by xi'an

“…there are actually a lot of applications where people do not try to construct an interpretable model, because they might believe that for a complex data set, an interpretable model could not possibly be as accurate as a black box. Or perhaps they want to preserve the model as proprietary.”

One article I found quite interesting in the second issue of HDSR is “Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition” by Cynthia Rudin and Joanna Radin, which describes the setting of a NeurIPS competition last year, the Explainable Machine Learning Challenge, of which I was blissfully unaware. The goal was to construct an operational black box predictor fpr credit scoring and turn it into something interpretable. The authors explain how they built instead a white box predictor (my terms!), namely a linear model, which could not be improved more than marginally by a black box algorithm. (It appears from the references that these authors have a record of analysing black-box models in various setting and demonstrating that they do not always bring more efficiency than interpretable versions.) While this is but one example and even though the authors did not win the challenge (I am unclear why as I did not check the background story, writing on the plane to pre-NeuriPS 2019).

I find this column quite refreshing and worth disseminating, as it challenges the current creed that intractable functions with hundreds of parameters will always do better, if only because they are calibrated within the box and have eventually difficulties to fight over-fitting within (and hence under-fitting outside). This is also a difficulty with common statistical models, but having the ability to construct error evaluations that show how quickly the prediction efficiency deteriorates may prove the more structured and more sparsely parameterised models the winner (of real world competitions).