## reading classics (#4)

Posted in Statistics, University life with tags , , , , , , , , , , on November 29, 2012 by xi'an

Another read today and not from JRSS B for once, namely,  Efron‘s (an)other look at the Jackknife, i.e. the 1979 bootstrap classic published in the Annals of Statistics. My Master students in the Reading Classics Seminar course thus listened today to Marco Brandi’s presentation, whose (Beamer) slides are here:

In my opinion this was an easier paper to discuss, more because of its visible impact than because of the paper itself, where the comparison with the jackknife procedure does not sound so relevant nowadays. again mostly algorithmic and requiring some background on how it impacted the field. Even though Marco also went through Don Rubin’s Bayesian bootstrap and Michael Jordan bag of little bootstraps, he struggled to get away from the technicality towards the intuition and the relevance of the method. The Bayesian bootstrap extension was quite interesting in that we discussed a lot the connections with Dirichlet priors and the lack of parameters that sounded quite antagonistic with the Bayesian principles. However, at the end of the day, I feel that this foundational paper was not explored in proportion to its depth and that it would be worth another visit.

## reading classics (#3)

Posted in Statistics, University life with tags , , , , , , , , , , , , on November 15, 2012 by xi'an

Following in the reading classics series, my Master students in the Reading Classics Seminar course, listened today to Kaniav Kamary analysis of Denis Lindley’s and Adrian Smith’s 1972 linear Bayes paper Bayes Estimates for the Linear Model in JRSS Series B. Here are her (Beamer) slides

At a first (mathematical) level this is an easier paper in the list, because it relies on linear algebra and normal conditioning. Of course, this is not the reason why Bayes Estimates for the Linear Model is in the list and how it impacted the field. It is indeed one of the first expositions on hierarchical Bayes programming, with some bits of empirical Bayes shortcuts when computation got a wee in the way. (Remember, this is 1972, when shrinkage estimation and its empirical Bayes motivations is in full blast…and—despite Hstings’ 1970 Biometrika paper—MCMC is yet to be imagined, except maybe by Julian Besag!) So, at secondary and tertiary levels, it is again hard to discuss, esp. with Kaniav’s low fluency in English. For instance, a major concept in the paper is exchangeability, not such a surprise given Adrian Smith’s translation of de Finetti into English. But this is a hard concept if only looking at the algebra within the paper, as a motivation for exchangeability and partial exchangeability (and hierarchical models) comes from applied fields like animal breeding (as in Sørensen and Gianola’s book). Otherwise, piling normal priors on top of normal priors is lost on the students. An objection from a 2012 reader is also that the assumption of exchangeability on the parameters of a regression model does not really make sense when the regressors are not normalised (this is linked to yesterday’s nefarious post!): I much prefer the presentation we make of the linear model in Chapter 3 of our Bayesian Core. Based on Arnold Zellner‘s g-prior. An interesting question from one student was whether or not this paper still had any relevance, other than historical. I was a bit at a loss on how to answer as, again, at a first level, the algebra was somehow natural and, at a statistical level, less informative priors could be used. However, the idea of grouping parameters together in partial exchangeability clusters remained quite appealing and bound to provide gains in precision….

## reading classics (#2)

Posted in Statistics, University life with tags , , , , , , , , , , , on November 8, 2012 by xi'an

Following last week read of Hartigan and Wong’s 1979 K-Means Clustering Algorithm, my Master students in the Reading Classics Seminar course, listened today to Agnė Ulčinaitė covering Rob Tibshirani‘s original LASSO paper Regression shrinkage and selection via the lasso in JRSS Series B. Here are her (Beamer) slides

Again not the easiest paper in the list, again mostly algorithmic and requiring some background on how it impacted the field. Even though Agnė also went through the Elements of Statistical Learning by Hastie, Friedman and Tibshirani, it was hard to get away from the paper to analyse more widely the importance of the paper, the connection with the Bayesian (linear) literature of the 70’s, its algorithmic and inferential aspects, like the computational cost, and the recent extensions like Bayesian LASSO. Or the issue of handling n<p models. Remember that one of the S in LASSO stands for shrinkage: it was quite pleasant to hear again about ridge estimators and Stein’s unbiased estimator of the risk, as those were themes of my Ph.D. thesis… (I hope the students do not get discouraged by the complexity of those papers: there were fewer questions and fewer students this time. Next week, the compass will move to the Bayesian pole with a talk on Lindley and Smith’s 1973 linear Bayes paper by one of my PhD students.)

## reading classics (#1)

Posted in Statistics, University life with tags , , , , , , , on October 26, 2012 by xi'an

This year, a lot of my Master students (plus all of my PhD students) registered for the Reading Classics Seminar course, so we should spend half of the year going through those “classics“. And have lively discussions thanks to the size of the group. The first student to present a paper, Céline Beji, chose Hartigan and Wong’s 1979 K-Means Clustering Algorithm paper in JRSS C. She did quite well, esp. when considering she had two weeks to learn $\mathrm{L\!\!^{{}_{\scriptstyle A}} \!\!\!\!\!\;\; T\!_{\displaystyle E} \! X}$ and Beamer in addition to getting thru the paper! She also managed to find an online demo of the algorithm. Here are her slides

This was not the easiest paper in the list, by far: it is short, mostly algorithmic and somehow requires some background on the reasons why clustering was of interest and on how it impacted the field. Tellingly, the discussion with the class then focussed on the criterion rather than on the algorithm itself. In a sense, this is the most striking feature of the paper, namely that it is completely a-statistical in picking a criterion to minimise. there is neither randomness nor error involved at this stage, it is simply an extended least-square approach. This is why the number of clusters—and again the discussion from the class spent some time on this—cannot be inferred via this method. A well-auguring start to the course!