Another history of MCMC
In the most recent issue of Statistical Science, the special topic is “Celebrating the EM Algorithm’s Quandunciacentennial“. It contains an historical survey by Martin Tanner and Wing Wong on the emergence of MCMC Bayesian computation in the 1980’s, This survey is more focused and more informative than our global history (also to appear in Statistical Science). In particular, it provides the authors’ analysis as to why MCMC was delayed by ten years or so (or even more when considering that a Gibbs sampler as a simulation tool appears in both Hastings’ (1970) and Besag‘s (1974) papers). They dismiss [our] concerns about computing power (I was running Monte Carlo simulations on my Apple IIe by 1986 and a single mean square error curve evaluation for a James-Stein type estimator would then take close to a weekend!) and Markov innumeracy, rather attributing the reluctance to a lack of confidence into the method. This perspective remains debatable as, apart from Tony O’Hagan who was then fighting again Monte Carlo methods as being un-Bayesian (1987, JRSS D), I do not remember any negative attitude at the time about simulation and the immediate spread of the MCMC methods from Alan Gelfand’s and Adrian Smith’s presentations of their 1990 paper shows on the opposite that the Bayesian community was ready for the move.
Another interesting point made in this historical survey is that Metropolis’ and other Markov chain methods were first presented outside simulation sections of books like Hammersley and Handscomb (1964), Rubinstein (1981) and Ripley (1987), perpetuating the impression that such methods were mostly optimisation or niche specific methods. This is also why Besag’s earlier works (not mentioned in this survey) did not get wider recognition, until later. Something I was not aware is the appearance of iterative adaptive importance sampling (i.e. population Monte Carlo) in the Bayesian literature of the 1980’s, with proposals from Herman van Dijk, Adrian Smith, and others. The appendix about Smith et al. (1985), the 1987 special issue of JRSS D, and the computation contents of Valencia 3 (that I sadly missed for being in the Army!) is also quite informative about the perception of computational Bayesian statistics at this time.
A missing connection in this survey is Gilles Celeux and Jean Diebolt’s stochastic EM (or SEM). As early as 1981, with Michel Broniatowski, they proposed a simulated version of EM for mixtures where the latent variable z was simulated from its conditional distribution rather than replaced with its expectation. So this was the first half of the Gibbs sampler for mixtures we completed with Jean Diebolt about ten years later. (Also found in Gelman and King, 1990.) These authors did not get much recognition from the community, though, as they focused almost exclusively on mixtures, used simulation to produce a randomness that would escape the local mode attraction, rather than targeting the posterior distribution, and did not analyse the Markovian nature of their algorithm until later with the simulated annealing EM algorithm.