**T**he scheduled program for ISBA 2020 is now on-line. And full of exciting sessions, many with computational focus. With dear hopes that the nCo-2019 epidemics will have abated by then (and not solely for the sake of the conference, most obviously!). While early registration ends by 15 April, the deadline for junior travel support ends up this month. And so does the deadline for contributions.

## Archive for Bayesian statistics

## ISBA2020 program

Posted in Kids, Statistics, Travel, University life with tags approximate Bayesian inference, Bayesian computing, Bayesian statistics, China, conference, coronavirus epidemcs, high dimensions, ISBA 2020, Kunming, nCo-2019, program, variational Bayes methods, Yunnan on January 29, 2020 by xi'an## Hastings 50 years later

Posted in Books, pictures, Statistics, University life with tags 1066, asynchronous algorithms, automation, Battle of Hastings, Bayesian statistics, BUGS, history of statistics, incompatible conditionals, Metropolis-Hastings algorithms, Normans, pseudo-marginal MCMC, STAN, Wilfred Keith Hastings on January 9, 2020 by xi'an**W**hat is the exact impact of the Metropolis-Hastings algorithm on the field of Bayesian statistics? and what are the new tools of the trade? What I personally find the most relevant and attractive element in a review on the topic is the current role of this algorithm, rather than its past (his)story, since many such reviews have already appeared and will likely continue to appear. What matters most imho is how much the Metropolis-Hastings algorithm signifies for the community at large, especially beyond academia. Is the availability or unavailability of software like BUGS or Stan a help or an hindrance? Was Hastings’ paper the start of the era of approximate inference or the end of exact inference? Are the algorithm intrinsic features like Markovianity a fundamental cause for an eventual extinction because of the ensuing time constraint and the lack of practical guarantees of convergence and the illusion of a fully automated version? Or are emerging solutions like unbiased MCMC and asynchronous algorithms a beacon of hope?

In their Biometrika paper, Dunson and Johndrow (2019) recently wrote a celebration of Hastings’ 1970 paper in Biometrika, where they cover adaptive Metropolis (Haario et al., 1999; Roberts and Rosenthal, 2005), the importance of gradient based versions toward universal algorithms (Roberts and Tweedie, 1995; Neal, 2003), discussing the advantages of HMC over Langevin versions. They also recall the significant step represented by Peter Green’s (1995) reversible jump algorithm for multimodal and multidimensional targets, as well as tempering (Miasojedow et al., 2013; Woodard et al., 2009). They further cover intractable likelihood cases within MCMC (rather than ABC), with the use of auxiliary variables (Friel and Pettitt, 2008; Møller et al., 2006) and pseudo-marginal MCMC (Andrieu and Roberts, 2009; Andrieu and Vihola, 2016). They naturally insist upon the need to handle huge datasets, high-dimension parameter spaces, and other scalability issues, with links to unadjusted Langevin schemes (Bardenet et al., 2014; Durmus and Moulines, 2017; Welling and Teh, 2011). Similarly, Dunson and Johndrow (2019) discuss recent developments towards parallel MCMC and non-reversible schemes such as PDMP as highly promising, with a concluding section on the challenges of automatising and robustifying much further the said procedures, if only to reach a wider range of applications. The paper is well-written and contains a wealth of directions and reflections, including those in my above introduction. Here are some mostly disconnected directions I would have liked to see covered or more covered

- convergence assessment today, e.g. the comparison of various approximation schemes
- Rao-Blackwellisation and other post-processing improvements
- other approximate inference tools than the pseudo-marginal MCMC
- importance of the parameterisation of the problem for convergence
- dimension issues and connection with quasi-Monte Carlo
- constrained spaces of measure zero, as for instance matrix distributions imposing zeros outside a diagonal band
- given the rise of the machine(-learners), are exploratory and intrinsically slow algorithms like MCMC doomed or can both fields feed one another? The section on optimisation could be expanded in that direction
- the wasteful nature of the random walk feature of MCMC algorithms, as opposed to non-reversible kernels like HMC and other PDMPs, missing from the gradient based methods section (and can we once again learn from physicists?)
- finer convergence issues and hence inference difficulties with complex MCMC algorithms like Gibbs samplers with incompatible conditionals
- use of the Hastings ratio in other algorithms like ABC or EP (in link with the section on generalised Bayes)
- adapting Metropolis-Hastings methods for emerging computing tools like GPUs and quantum computers

or possibly less covered, namely data augmentation put forward when it is a special case of auxiliary variables as in slice sampling and in earlier physics literature. For instance, both probit and logistic regressions do not truly require data augmentation and are more toy examples than really challenging applications. The approach of Carlin & Chib (1995) is another illustration, which has met with recent interest, despite requiring heavy calibration (just like RJMCMC). As well as a a somewhat awkward opposition between Gibbs and Hastings, in that I am not convinced that Gibbs does not remain ultimately necessary to handle high dimension problems, in the sense that the alternative solutions like Langevin, HMC, or PDMP, or…, are relying on Euclidean assumptions for the entire vector, while a direct product of Euclidean structures may prove more adequate.

## a hatchet job [book review]

Posted in Books, Statistics, University life with tags Bayes theorem, Bayesian statistics, betting, book review, Bruce Hill, Bruno de Finetti, JASA, John Hartigan, Likelihood Principle on July 20, 2019 by xi'an**B**y happenstance, I came across a rather savage review of John Hartigan’s Bayes Theory (1984) written by Bruce Hill in HASA, including the following slivers:

“By and large this book is at its best in developing the mathematical consequences of the theory and at its worst when dealing with the underlying ideas and concepts, which seems unfortunate since Bayesian statistics is above all an attempt to deal realistically with the nature of uncertainty and decision making.” B. Hill, JASA, 1986, p.569

“Unfortunately, those who had hoped for a serious contribution to the question will be disappointed.” B. Hill, JASA, 1986, p.569

“If the primary concern is mathematical convenience, not content or meaning, then the enterprise is a very different matter from what most of us think of as Bayesian approach.” B. Hill, JASA, 1986, p.570

“Perhaps in a century or two statisticians and probabilists will reach a similar state of maturity.” B. Hill, JASA, 1986, p.570“

Perhaps this is a good place to mention that the notation in the book is formidable. Bayes’s theorem appears in a form that is almost unrecognizable. As elsewhere, the mathematical treatment is elegant. but none of the deeper issues about the meaning and interpretation of conditional probability is discussed.” B. Hill, JASA, 1986, p.570

“The reader will find many intriguing ideas, much that is outrageous, and even some surprises (the likelihood principle is not mentioned, and conditional inference is just barely mentioned).” B. Hill, JASA, 1986, p.571

“What is disappointing to me is that with a little more discipline and effort with regard to the ideas underlying Bayesian statistics, this book could have been a major contribution to the theory.” B. Hill, JASA, 1986, p.571

Another review by William Sudderth (1985, Bulletin of the American Mathematical Society) is much kinder to the book, except for the complaint that “the pace is brisk and sometimes hard to follow”.

## I’m getting the point

Posted in Statistics with tags Bayesian statistics, Bayesian textbook, conjugate priors, cross validated, final exam, StackExchange, teaching on February 14, 2019 by xi'an**A** long-winded X validated discussion on the [textbook] mean-variance conjugate posterior for the Normal model left me [mildly] depressed at the point and use of answering questions on this forum. Especially as it came at the same time as a catastrophic outcome for my mathematical statistics exam. Possibly an incentive to quit X validated as one quits smoking, although this is not the first attempt…

## Binomial vs Bernoulli

Posted in Books, Statistics with tags Bayesian model choice, Bayesian statistics, conditioning, cross validated, sufficiency on December 25, 2018 by xi'an**A**n interesting confusion on X validated where someone was convinced that using the Bernoulli representation of a sequence of Bernoulli experiments led to different posterior probabilities of two possible models than when using their Binomial representation. The confusion actually stemmed from using different conditionals, namely N¹=4,N²=1 in the first case (for a model M¹ with two probabilities p¹ and p²) and N¹+N²=5 in the second case (for a model M² with a single probability p⁰). While (N¹,N²) is sufficient for the first model and N¹+N² is sufficient for the second model, P(M¹|N¹,N²) is not commensurable to P(M²|N¹+N²)! Another illustration of the fickleness of the notion of sufficiency when comparing models.