Archive for Wilfred Keith Hastings

Hastings 50 years later

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on January 9, 2020 by xi'an

What is the exact impact of the Metropolis-Hastings algorithm on the field of Bayesian statistics? and what are the new tools of the trade? What I personally find the most relevant and attractive element in a review on the topic is the current role of this algorithm, rather than its past (his)story, since many such reviews have already appeared and will likely continue to appear. What matters most imho is how much the Metropolis-Hastings algorithm signifies for the community at large, especially beyond academia. Is the availability or unavailability of software like BUGS or Stan a help or an hindrance? Was Hastings’ paper the start of the era of approximate inference or the end of exact inference? Are the algorithm intrinsic features like Markovianity a fundamental cause for an eventual extinction because of the ensuing time constraint and the lack of practical guarantees of convergence and the illusion of a fully automated version? Or are emerging solutions like unbiased MCMC and asynchronous algorithms a beacon of hope?

In their Biometrika paper, Dunson and Johndrow (2019) recently wrote a celebration of Hastings’ 1970 paper in Biometrika, where they cover adaptive Metropolis (Haario et al., 1999; Roberts and Rosenthal, 2005), the importance of gradient based versions toward universal algorithms (Roberts and Tweedie, 1995; Neal, 2003), discussing the advantages of HMC over Langevin versions. They also recall the significant step represented by Peter Green’s (1995) reversible jump algorithm for multimodal and multidimensional targets, as well as tempering (Miasojedow et al., 2013; Woodard et al., 2009). They further cover intractable likelihood cases within MCMC (rather than ABC), with the use of auxiliary variables (Friel and Pettitt, 2008; Møller et al., 2006) and pseudo-marginal MCMC (Andrieu and Roberts, 2009; Andrieu and Vihola, 2016). They naturally insist upon the need to handle huge datasets, high-dimension parameter spaces, and other scalability issues, with links to unadjusted Langevin schemes (Bardenet et al., 2014; Durmus and Moulines, 2017; Welling and Teh, 2011). Similarly, Dunson and Johndrow (2019) discuss recent developments towards parallel MCMC and non-reversible schemes such as PDMP as highly promising, with a concluding section on the challenges of automatising and robustifying much further the said procedures, if only to reach a wider range of applications. The paper is well-written and contains a wealth of directions and reflections, including those in my above introduction. Here are some mostly disconnected directions I would have liked to see covered or more covered

  1. convergence assessment today, e.g. the comparison of various approximation schemes
  2. Rao-Blackwellisation and other post-processing improvements
  3. other approximate inference tools than the pseudo-marginal MCMC
  4. importance of the parameterisation of the problem for convergence
  5. dimension issues and connection with quasi-Monte Carlo
  6. constrained spaces of measure zero, as for instance matrix distributions imposing zeros outside a diagonal band
  7. given the rise of the machine(-learners), are exploratory and intrinsically slow algorithms like MCMC doomed or can both fields feed one another? The section on optimisation could be expanded in that direction
  8. the wasteful nature of the random walk feature of MCMC algorithms, as opposed to non-reversible kernels like HMC and other PDMPs, missing from the gradient based methods section (and can we once again learn from physicists?)
  9. finer convergence issues and hence inference difficulties with complex MCMC algorithms like Gibbs samplers with incompatible conditionals
  10. use of the Hastings ratio in other algorithms like ABC or EP (in link with the section on generalised Bayes)
  11. adapting Metropolis-Hastings methods for emerging computing tools like GPUs and quantum computers

or possibly less covered, namely data augmentation put forward when it is a special case of auxiliary variables as in slice sampling and in earlier physics literature. For instance, both probit and logistic regressions do not truly require data augmentation and are more toy examples than really challenging applications. The approach of Carlin & Chib (1995) is another illustration, which has met with recent interest, despite requiring heavy calibration (just like RJMCMC). As well as a a somewhat awkward opposition between Gibbs and Hastings, in that I am not convinced that Gibbs does not remain ultimately necessary to handle high dimension problems, in the sense that the alternative solutions like Langevin, HMC, or PDMP, or…, are relying on Euclidean assumptions for the entire vector, while a direct product of Euclidean structures may prove more adequate.

Metropolis gets off the ground

Posted in Books, Kids, Statistics with tags , , , , , , , on April 1, 2019 by xi'an

An X validated discussion that toed-and-froed about an incomprehension of the Metropolis-Hastings algorithm. Which started with a blame of George Casella‘s and Roger Berger’s Statistical Inference (p.254), when the real issue was the inquisitor having difficulties with the notation V ~ f(v), or the notion of random variable [generation], mistaking identically distributed with identical. Even (me) crawling from one iteration to the next did not help at the beginning. Another illustration of the strong tendency on this forum to jettison fundamental prerequisites…

Wilfred Keith Hastings [1930-2016]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on December 9, 2016 by xi'an

A few days ago I found on the page Jeff Rosenthal has dedicated to Hastings that he has passed away peacefully on May 13, 2016 in Victoria, British Columbia, where he lived for 45 years as a professor at the University of Victoria. After holding positions at University of Toronto, University of Canterbury (New Zealand), and Bell Labs (New Jersey). As pointed out by Jeff, Hastings’ main paper is his 1970 Biometrika description of Markov chain Monte Carlo methods, Monte Carlo sampling methods using Markov chains and their applications. Which would take close to twenty years to become known to the statistics world at large, although you can trace a path through Peskun (his only PhD student) , Besag and others. I am sorry it took so long to come to my knowledge and also sorry it apparently went unnoticed by most of the computational statistics community.