reading classics (#2)
Following last week read of Hartigan and Wong’s 1979 K-Means Clustering Algorithm, my Master students in the Reading Classics Seminar course, listened today to Agnė Ulčinaitė covering Rob Tibshirani‘s original LASSO paper Regression shrinkage and selection via the lasso in JRSS Series B. Here are her (Beamer) slides
Again not the easiest paper in the list, again mostly algorithmic and requiring some background on how it impacted the field. Even though Agnė also went through the Elements of Statistical Learning by Hastie, Friedman and Tibshirani, it was hard to get away from the paper to analyse more widely the importance of the paper, the connection with the Bayesian (linear) literature of the 70’s, its algorithmic and inferential aspects, like the computational cost, and the recent extensions like Bayesian LASSO. Or the issue of handling n<p models. Remember that one of the S in LASSO stands for shrinkage: it was quite pleasant to hear again about ridge estimators and Stein’s unbiased estimator of the risk, as those were themes of my Ph.D. thesis… (I hope the students do not get discouraged by the complexity of those papers: there were fewer questions and fewer students this time. Next week, the compass will move to the Bayesian pole with a talk on Lindley and Smith’s 1973 linear Bayes paper by one of my PhD students.)
Related
This entry was posted on November 8, 2012 at 12:12 am and is filed under Statistics, University life with tags Bayesian lasso, Beamer, classics, Lasso, Master program, presentation, regression, ridge regression, Rob Tibshirani, shrinkage estimation, slides, Université Paris Dauphine. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
One Response to “reading classics (#2)”
Leave a comment Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
November 8, 2012 at 9:26 am
One of the interesting but, at the same time, *very disturbing* properties of the Lasso is that it sets some coefficients to 0, contrary to Ridge (or BLUP, or Stein, or whatever one wants to call it).
This gives a false sensation of “knowledge” (“this thing has no effect”) whereas in fact Lasso is telling “hey, this set of predictors is good enough given the restriction t that *you* imposed on them”.
But if effects are collinear (as often happens in genetics and possibly other fields) then change a little the data and Lasso will pick another set of predictors.
I wonder if I am the only one as feeling this trouble of interpretation as a problem :-/