Bayesian inference: challenges, perspectives, and prospects

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 29, 2023 by xi'an

Over the past year, Judith, Michael and I edited a special issue of Philosophical Transactions of the Royal Society on Bayesian inference: challenges, perspectives, and prospects, in celebration of the current President of the Royal Society, Adrian Smith, and his contributions to Bayesian analysis that have impacted the field up to this day. The issue is now out! The following is the beginning of our introduction of the series.

When contemplating his past achievements, it is striking to align the emergence of massive advances in these fields with some papers or books of his. For instance, Lindley’s & Smith’s ‘Bayes Estimates for the Linear Model’ (1971), a Read Paper at the Royal Statistical Society, is making the case for the Bayesian analysis of this most standard statistical model, as well as emphasizing the notion of exchangeability that is foundational in Bayesian statistics, and paving the way to the emergence of hierarchical Bayesian modelling. It thus makes a link between the early days of Bruno de Finetti, whose work Adrian Smith translated into English, and the current research in non-parametric and robust statistics. Bernardo’s & Smith’s masterpiece, Bayesian Theory (1994), sets statistical inference within decision- and information-theoretic frameworks in a most elegant and universal manner that could be deemed a Bourbaki volume for Bayesian statistics if this classification endeavour had reached further than pure mathematics. It also emphasizes the central role of hierarchical modelling in the construction of priors, as exemplified in Carlin’s et al.‘Hierarchical Bayesian analysis of change point problems’ (1992).

The series of papers published in 1990 by Alan Gelfand & Adrian Smith, esp. ‘Sampling-Based Approaches to Calculating Marginal Densities’ (1990), is overwhelmingly perceived as the birth date of modern Markov chain Monte Carlo (MCMC) methods, as itbrought to the whole statistics community (and the quickly wider communities) the realization that MCMC simulation was the sesame to unlock complex modelling issues. The consequences on the adoption of Bayesian modelling by non-specialists are enormous and long-lasting.Similarly, Gordon’set al.‘Novel approach to nonlinear/non-Gaussian Bayesian state estimation’ (1992) is considered as the birthplace of sequential Monte Carlo, aka particle filtering, with considerable consequences in tracking, robotics, econometrics and many other fields. Titterington’s, Smith’s & Makov’s reference book, ‘Statistical Analysis of Finite Mixtures(1984)  is a precursor in the formalization of heterogeneous data structures, paving the way for the incoming MCMC resolutions like Tanner & Wong (1987), Gelman & King (1990) and Diebolt & Robert (1990). Denison et al.and regression’ (2002) is another testimony to the influence of Adrian Smith on the field,stressing the emergence of robust and general classification and nonlinear regression methods to analyse complex data, prefiguring in a way the later emergence of machine-learning methods,with the additional Bayesian assessment of uncertainty. It is also bringing forward the capacity of operating Bayesian non-parametric modelling that is now broadly accepted, following a series of papers by Denison et al. in the late 1990s like CART and MARS.

We are quite grateful to the authors contributing to this volume, namely Joshua J. Bon, Adam Bretherton, Katie Buchhorn, Susanna Cramb, Christopher Drovandi, Conor Hassan, Adrianne L. Jenner, Helen J. Mayfield, James M. McGree, Kerrie Mengersen, Aiden Price, Robert Salomone, Edgar Santos-Fernandez, Julie Vercelloni and Xiaoyu Wang, Afonso S. Bandeira, Antoine Maillard, Richard Nickl and Sven Wang , Fan Li, Peng Ding and Fabrizia Mealli, Matthew Stephens, Peter D. Grünwald, Sumio Watanabe, P. Müller, N. K. Chandra and A. Sarkar, Kori Khan and Alicia Carriquiry, Arnaud Doucet, Eric Moulines and Achille Thin, Beatrice Franzolini, Andrea Cremaschi, Willem van den Boom and Maria De Iorio, Sandra Fortini and Sonia Petrone, Sylvia Frühwirth-Schnatter, S. Wade, Chris C. Holmes and Stephen G. Walker, Lizhen Nie and Veronika Ročková. Some of the papers are open-access, if not all, hence enjoy them!

multilevel linear models, Gibbs samplers, and multigrid decompositions

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on October 22, 2021 by xi'an

A paper by Giacommo Zanella (formerly Warwick) and Gareth Roberts (Warwick) is about to appear in Bayesian Analysis and (still) open for discussion. It examines in great details the convergence properties of several Gibbs versions of the same hierarchical posterior for an ANOVA type linear model. Although this may sound like an old-timer opinion, I find it good to have Gibbs sampling back on track! And to have further attention to diagnose convergence! Also, even after all these years (!), it is always a surprise  for me to (re-)realise that different versions of Gibbs samplings may hugely differ in convergence properties.

At first, intuitively, I thought the options (1,0) (c) and (0,1) (d) should be similarly performing. But one is “more” hierarchical than the other. While the results exhibiting a theoretical ordering of these choices are impressive, I would suggest pursuing an random exploration of the various parameterisations in order to handle cases where an analytical ordering proves impossible. It would most likely produce a superior performance, as hinted at by Figure 4. (This alternative happens to be briefly mentioned in the Conclusion section.) The notion of choosing the optimal parameterisation at each step is indeed somewhat unrealistic in that the optimality zones exhibited in Figure 4 are unknown in a more general model than the Gaussian ANOVA model. Especially with a high number of parameters, parameterisations, and recombinations in the model (Section 7).

An idle question is about the extension to a more general hierarchical model where recentring is not feasible because of the non-linear nature of the parameters. Even though Gaussianity may not be such a restriction in that other exponential (if artificial) families keeping the ANOVA structure should work as well.

Theorem 1 is quite impressive and wide ranging. It also reminded (old) me of the interleaving properties and data augmentation versions of the early-day Gibbs. More to the point and to the current era, it offers more possibilities for coupling, parallelism, and increasing convergence. And for fighting dimension curses.

“in this context, imposing identifiability always improves the convergence properties of the Gibbs Sampler”

Another idle thought of mine is to wonder whether or not there is a limited number of reparameterisations. I think that by creating unidentifiable decompositions of (some) parameters, eg, μ=μ¹+μ²+.., one can unrestrictedly multiply the number of parameterisations. Instead of imposing hard identifiability constraints as in Section 4.2, my intuition was that this de-identification would increase the mixing behaviour but this somewhat clashes with the above (rigorous) statement from the authors. So I am proven wrong there!

Unless I missed something, I also wonder at different possible implementations of HMC depending on different parameterisations and whether or not the impact of parameterisation has been studied for HMC. (Which may be linked with Remark 2?)

empirically Bayesian [wISBApedia]

Posted in Statistics with tags , , , , , , , on August 9, 2021 by xi'an

Last week I was pointed out a puzzling entry in the “empirical Bayes” Wikipedia page. The introduction section indeed contains a description of an iterative simulation method that involves an hyperprior $$p(η)$$even though the empirical Bayes perspective does not involve an hyperprior.

While the entry is vague and lacks formulae

These suggest an iterative scheme, qualitatively similar in structure to a Gibbs sampler, to evolve successively improved approximations to $$p(θ$$$$∣$$$$y)$$ and $$p(η∣y)$$. First, calculate an initial approximation to $$p(θ∣y)$$ ignoring the $$η$$ dependence completely; then calculate an approximation to $$p(η$$$$|$$$$y)$$ based upon the initial approximate distribution of $$p(θ$$$$∣$$$$y)$$; then use this $$p(η$$$$∣$$$$y)$$ to update the approximation for $$p(θ$$$$∣$$$$y)$$; then update $$p(η$$$$∣$$$$y)$$; and so on.

it sounds essentially equivalent to a Gibbs sampler, possibly a multiple try Gibbs sampler (unless the author had another notion in mind, alas impossible to guess since no reference is included).

Beyond this specific case, where I think the entire paragraph should be erased from the “empirical Bayes” Wikipedia page, I discussed the general problem of some poor Bayesian entries in Wikipedia with Robin Ryder, who came with the neat idea of running (collective) Wikipedia editing labs at ISBA conferences. If we could further give an ISBA label to these entries, as a certificate of “Bayesian orthodoxy” (!), it would be terrific!

latent variables for a hierarchical Poisson model

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , on March 11, 2021 by xi'an

Answering a question on X validated about a rather standard hierarchical Poisson model, and its posterior Gibbs simulation, where observations are (d and w being a document and a word index, resp.)

$N_{w,d}\sim\mathcal P(\textstyle\sum_{1\le k\le K} \pi_{k,d}\varphi_{k,w})\qquad(1)$

I found myself dragged into an extended discussion on the validation of creating independent Poisson latent variables

$N_{k,w,d}\sim\mathcal P(\pi_{k,d}\varphi_{k,w})\qquad(2)$

since observing their sum in (1) was preventing the latent variables (2) from being independent. And then found out that the originator of the question had asked on X validated an unanswered and much more detailed question in 2016, even though the notations differ. The question does contain the solution I proposed above, including the Multinomial distribution on the Poisson latent variables given their sum (and the true parameters). As it should be since the derivation was done in a linked 2014 paper by Gopalan, Hofman, and Blei, later published in the Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI). I am thus bemused at the question resurfacing five years later in a much simplified version, but still exhibiting the same difficulty with the conditioning principles…

Bayes @ NYT

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , on August 8, 2020 by xi'an

A tribune in the NYT of yesterday on the importance of being Bayesian. When an epidemiologist. Tribune that was forwarded to me by a few friends (and which I missed on my addictive monitoring of the journal!). It is written by , a Canadian journalist writing about mathematics (and obviously statistics). And it brings to the general public the main motivation for adopting a Bayesian approach, namely its coherent handling of uncertainty and its ability to update in the face of new information. (Although it might be noted that other flavours of statistical analysis are also able to update their conclusions when given more data.) The COVID situation is a perfect case study in Bayesianism, in that there are so many levels of uncertainty and imprecision, from the models themselves, to the data, to the outcome of the tests, &tc. The article is journalisty, of course, but it quotes from a range of statisticians and epidemiologists, including Susan Holmes, whom I learned was quarantined 105 days in rural Portugal!, developing a hierarchical Bayes modelling of the prevalent  SEIR model, and David Spiegelhalter, discussing Cromwell’s Law (or better, humility law, for avoiding the reference to a fanatic and tyrannic Puritan who put Ireland to fire and the sword!, and had in fact very little humility for himself). Reading the comments is both hilarious (it does not take long to reach the point when Trump is mentioned, and Taleb’s stance on models and tails makes an appearance) and revealing, as many readers do not understand the meaning of Bayes’ inversion between causes and effects, or even the meaning of Jeffreys’ bar, |, as conditioning.