I’d get rid of the “Metropolis within Gibbs” terminology. The Metropolis algorithm has (from the very first paper) been mostly used to update single variables, or small sets of variables, not all variables at once. So this cumbersome phrase describes the usual case.
Geoffrey Hinton and co-authors was using Gibbs sampling for log-linear models with latent variables (which they call “Boltzmann machines” in the mid-1980s, in order to do maximum likelihood estimation for their parameters. See Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985) A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147–169.
My 1993 review on Probabilistic Inference Using Markov Chain Monte Carlo Methods has many references to the earlier physics literature, including many on “free energy estimation”, which is computationally equivalent to marginal likelihood estimation. This review paper was influential in getting at least some statisticians to stop re-inventing the wheel in this regard.
One paper I reference in this review that might be of interest to you is Wood, W. W. (1985) “Early history of computer simulations in statistical mechanics”, in Molecular-Dynamics Simulation of Statistical-Mechanical Systems (Proceedings of the International School of Physics “Enrico Fermi”, Course 97), Amsterdam: North-Holland.
Thank you, Radford. I will strive to make the link to the Physics literature in the slides before tomorrow! As for including them in the paper, I fear we are at too late a stage to change anything… This is a very incomplete history of everything, mostly made of personal recollections of how we saw the thing unfolding. There is another paper that very recently came across (even quoting this Short history &tc.) by Mathew Richey in the American Mathematical Monthly.