**A** question related to the earlier post on the first *importance sampling* in print, about the fist *Markov chain Monte Carlo* in print. Again uncovered by Charly, a 1973 Chemical Physics paper by Patey and Valleau, the latter inventing umbrella sampling with Torrie at about the same time. (In a 1972 paper in the same journal with Card, Valleau uses *Metropolis Monte Carlo*. While Hastings, also at the University of Toronto uses *Markov chain sampling.*)

## Archive for Wilfred Keith Hastings

## another first

Posted in Statistics with tags Chemical Physics Letters, history of Monte Carlo, importance sampling, John Valleau, Markov chain Monte Carlo, MCMC, Metropolis algorithm, umbrella sampling, Wilfred Keith Hastings on July 1, 2022 by xi'an## my own personal hope for the future is that we won’t have to build any more random number generators…

Posted in Books, Statistics, University life with tags cross validated, IBM, punched card, rand, random number generation, roulette, Wilfred Keith Hastings on April 19, 2020 by xi'an**C**ame perchance upon this reminiscence about the generation of the 10⁶ random digits found in the book published by the RAND Corporation. It took them a month to produce half a million digits, exploiting a “random frequency pulse source gated by a constant frequency pulse” behaving like a “roulette wheel with 32 positions, making on the average 3000 revolutions on each turn”. As the outcome failed on the odd/even ratio test, the RAND engineers randomized further the outcome by adding “(mod 10) the digits in each card, digit by digits, to the corresponding digits of the previous card”. (Cards as in punched cards, the outcome being printed 50 digits at a time on I.B.M. cards.) A last piece of Monte Carlo trivia is that the electronic roulette at the basis of this random generator was devised by Hastings, Cecil not Wilfred Keith. (And RAND is an abbreviation of Research and Development, not of randomness!)

## Hastings 50 years later

Posted in Books, pictures, Statistics, University life with tags 1066, asynchronous algorithms, automation, Battle of Hastings, Bayesian statistics, BUGS, history of statistics, incompatible conditionals, Metropolis-Hastings algorithms, Normans, pseudo-marginal MCMC, STAN, Wilfred Keith Hastings on January 9, 2020 by xi'an**W**hat is the exact impact of the Metropolis-Hastings algorithm on the field of Bayesian statistics? and what are the new tools of the trade? What I personally find the most relevant and attractive element in a review on the topic is the current role of this algorithm, rather than its past (his)story, since many such reviews have already appeared and will likely continue to appear. What matters most imho is how much the Metropolis-Hastings algorithm signifies for the community at large, especially beyond academia. Is the availability or unavailability of software like BUGS or Stan a help or an hindrance? Was Hastings’ paper the start of the era of approximate inference or the end of exact inference? Are the algorithm intrinsic features like Markovianity a fundamental cause for an eventual extinction because of the ensuing time constraint and the lack of practical guarantees of convergence and the illusion of a fully automated version? Or are emerging solutions like unbiased MCMC and asynchronous algorithms a beacon of hope?

In their Biometrika paper, Dunson and Johndrow (2019) recently wrote a celebration of Hastings’ 1970 paper in Biometrika, where they cover adaptive Metropolis (Haario et al., 1999; Roberts and Rosenthal, 2005), the importance of gradient based versions toward universal algorithms (Roberts and Tweedie, 1995; Neal, 2003), discussing the advantages of HMC over Langevin versions. They also recall the significant step represented by Peter Green’s (1995) reversible jump algorithm for multimodal and multidimensional targets, as well as tempering (Miasojedow et al., 2013; Woodard et al., 2009). They further cover intractable likelihood cases within MCMC (rather than ABC), with the use of auxiliary variables (Friel and Pettitt, 2008; Møller et al., 2006) and pseudo-marginal MCMC (Andrieu and Roberts, 2009; Andrieu and Vihola, 2016). They naturally insist upon the need to handle huge datasets, high-dimension parameter spaces, and other scalability issues, with links to unadjusted Langevin schemes (Bardenet et al., 2014; Durmus and Moulines, 2017; Welling and Teh, 2011). Similarly, Dunson and Johndrow (2019) discuss recent developments towards parallel MCMC and non-reversible schemes such as PDMP as highly promising, with a concluding section on the challenges of automatising and robustifying much further the said procedures, if only to reach a wider range of applications. The paper is well-written and contains a wealth of directions and reflections, including those in my above introduction. Here are some mostly disconnected directions I would have liked to see covered or more covered

- convergence assessment today, e.g. the comparison of various approximation schemes
- Rao-Blackwellisation and other post-processing improvements
- other approximate inference tools than the pseudo-marginal MCMC
- importance of the parameterisation of the problem for convergence
- dimension issues and connection with quasi-Monte Carlo
- constrained spaces of measure zero, as for instance matrix distributions imposing zeros outside a diagonal band
- given the rise of the machine(-learners), are exploratory and intrinsically slow algorithms like MCMC doomed or can both fields feed one another? The section on optimisation could be expanded in that direction
- the wasteful nature of the random walk feature of MCMC algorithms, as opposed to non-reversible kernels like HMC and other PDMPs, missing from the gradient based methods section (and can we once again learn from physicists?)
- finer convergence issues and hence inference difficulties with complex MCMC algorithms like Gibbs samplers with incompatible conditionals
- use of the Hastings ratio in other algorithms like ABC or EP (in link with the section on generalised Bayes)
- adapting Metropolis-Hastings methods for emerging computing tools like GPUs and quantum computers

or possibly less covered, namely data augmentation put forward when it is a special case of auxiliary variables as in slice sampling and in earlier physics literature. For instance, both probit and logistic regressions do not truly require data augmentation and are more toy examples than really challenging applications. The approach of Carlin & Chib (1995) is another illustration, which has met with recent interest, despite requiring heavy calibration (just like RJMCMC). As well as a a somewhat awkward opposition between Gibbs and Hastings, in that I am not convinced that Gibbs does not remain ultimately necessary to handle high dimension problems, in the sense that the alternative solutions like Langevin, HMC, or PDMP, or…, are relying on Euclidean assumptions for the entire vector, while a direct product of Euclidean structures may prove more adequate.

## Metropolis gets off the ground

Posted in Books, Kids, Statistics with tags Cabourg, cross validated, forum, independent Metropolis-Hastings algorithm, prerequisites, random variable, restaurant, Wilfred Keith Hastings on April 1, 2019 by xi'an**A**n X validated discussion that toed-and-froed about an incomprehension of the Metropolis-Hastings algorithm. Which started with a blame of George Casella‘s and Roger Berger’s Statistical Inference (p.254), when the real issue was the inquisitor having difficulties with the notation *V ~ f(v)*, or the notion of random variable [generation], mistaking identically distributed with identical. Even (me) crawling from one iteration to the next did not help at the beginning. Another illustration of the strong tendency on this forum to jettison fundamental prerequisites…

## Wilfred Keith Hastings [1930-2016]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags Bell Labs, Biometrika, Canada, Julian Besag, Metropolis-Hastings algorithm, obituary, Peskun ordering, University of Canterbury, University of Victoria, Victoria, Wilfred Keith Hastings on December 9, 2016 by xi'an**A** few days ago I found on the page Jeff Rosenthal has dedicated to Hastings that he has passed away peacefully on May 13, 2016 in Victoria, British Columbia, where he lived for 45 years as a professor at the University of Victoria. After holding positions at University of Toronto, University of Canterbury (New Zealand), and Bell Labs (New Jersey). As pointed out by Jeff, Hastings’ main paper is his 1970 Biometrika description of Markov chain Monte Carlo methods, Monte Carlo sampling methods using Markov chains and their applications. Which would take close to twenty years to become known to the statistics world at large, although you can trace a path through Peskun (his only PhD student) , Besag and others. I am sorry it took so long to come to my knowledge and also sorry it apparently went unnoticed by most of the computational statistics community.