Following in the reading classics series, my Master students in the Reading Classics Seminar course, listened today to Kaniav Kamary analysis of Denis Lindley’s and Adrian Smith’s 1972 linear Bayes paper Bayes Estimates for the Linear Model in JRSS Series B. Here are her (Beamer) slides
At a first (mathematical) level this is an easier paper in the list, because it relies on linear algebra and normal conditioning. Of course, this is not the reason why Bayes Estimates for the Linear Model is in the list and how it impacted the field. It is indeed one of the first expositions on hierarchical Bayes programming, with some bits of empirical Bayes shortcuts when computation got a wee in the way. (Remember, this is 1972, when shrinkage estimation and its empirical Bayes motivations is in full blast…and—despite Hstings’ 1970 Biometrika paper—MCMC is yet to be imagined, except maybe by Julian Besag!) So, at secondary and tertiary levels, it is again hard to discuss, esp. with Kaniav’s low fluency in English. For instance, a major concept in the paper is exchangeability, not such a surprise given Adrian Smith’s translation of de Finetti into English. But this is a hard concept if only looking at the algebra within the paper, as a motivation for exchangeability and partial exchangeability (and hierarchical models) comes from applied fields like animal breeding (as in Sørensen and Gianola’s book). Otherwise, piling normal priors on top of normal priors is lost on the students. An objection from a 2012 reader is also that the assumption of exchangeability on the parameters of a regression model does not really make sense when the regressors are not normalised (this is linked to yesterday’s nefarious post!): I much prefer the presentation we make of the linear model in Chapter 3 of our Bayesian Core. Based on Arnold Zellner‘s g-prior. An interesting question from one student was whether or not this paper still had any relevance, other than historical. I was a bit at a loss on how to answer as, again, at a first level, the algebra was somehow natural and, at a statistical level, less informative priors could be used. However, the idea of grouping parameters together in partial exchangeability clusters remained quite appealing and bound to provide gains in precision….