Archive for Bayesian foundations

Masterclass in Bayesian Asymptotics, Université Paris Dauphine, 18-22 March 2024

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on December 8, 2023 by xi'an

On the week of 18-22 March 2024, Judith Rousseau (Paris Dauphine & Oxford) will teach a Masterclass on Bayesian asymptotics. The masterclass takes place in Paris (on the PariSanté Campus) and consists of morning lectures and afternoon labs. Attendance is free with compulsory registration before 11 March (since the building is not accessible without prior registration).

The plan of the course is as follows

Part I: Parametric models
In this part, well- and mis-specified models will be considered.
– Asymptotic posterior distribution: asymptotic normality of the posterior,  penalization induced by the prior and the Bernstein von – Mises theorem. Regular and nonregular models will be treated.
– marginal likelihood and consistency of Bayes factors/model selection approaches.
– Empirical Bayes methods: asymptotic posterior distribution for parametric empirical Bayes methods.

Part II: Nonparametric and semiparametric models
– Posterior consistency and posterior convergence rates: statistical loss functions using the theory initiated by L. Schwartz and developed by Ghosal and Van der Vaart, results on less standard or well behaved losses.
– semiparametric Bernstein von Mises theorems.
– nonparametric Bernstein von Mises theorems and Uncertainty quantification.
– Stepping away from pure Bayes approaches: generalized Bayes, one step posteriors and cut posteriors.

Bayes’s theorem for improper mixtures

Posted in Books, Statistics, University life with tags , , , , , , on July 19, 2023 by xi'an

While looking for references for a Master summer project at Warwick on Bayesian inference on the Cauchy location parameter, I came across a 2011 Annals of Statistics paper by Peter McCullagh and Han Han.  Which expands the Bayesian framework to the improper case by considering a Poisson process over the parameter set with mean measure ν the improper prior. Instead of a single random parameter, this construct returns a countable collection of pairs (θ,y), while the observations induce a subset of that collection constrained by y∈A, a “sampling region” both capital to the derivation of the joint distribution and obscure in that A remains unspecified (but such that 0<ν(A)<∞ and conveniently returning the observed sample of y’s).

“Provided that the key finiteness condition is satisfied, this probabilistic analysis of the extended model may be interpreted as a vindication of improper Bayes procedures derived from the original model.”

“Thus, the existence of a joint probability model associated with an improper prior does not imply optimality in the form of coherence, consistency or admissibility.”

This is definitely fascinating!, even though I have troubles linking this infinite sequence of θ‘s with regular Bayesian inference, since the examples in the paper seem to revert to a single parameter value, as in §4.1, for the Normal model and §5 for the Cauchy model. The authors also revisit the marginalisation paradoxes of Dawid, Stone and Zidek (1973), with the argument that the improper measure leading to the paradox is not compatible with ν(A)<∞, hence does not define a natural conditional, while the “other” improper measure avoids the paradox.

Bayesian inference: challenges, perspectives, and prospects

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 29, 2023 by xi'an

Over the past year, Judith, Michael and I edited a special issue of Philosophical Transactions of the Royal Society on Bayesian inference: challenges, perspectives, and prospects, in celebration of the current President of the Royal Society, Adrian Smith, and his contributions to Bayesian analysis that have impacted the field up to this day. The issue is now out! The following is the beginning of our introduction of the series.

When contemplating his past achievements, it is striking to align the emergence of massive advances in these fields with some papers or books of his. For instance, Lindley’s & Smith’s ‘Bayes Estimates for the Linear Model’ (1971), a Read Paper at the Royal Statistical Society, is making the case for the Bayesian analysis of this most standard statistical model, as well as emphasizing the notion of exchangeability that is foundational in Bayesian statistics, and paving the way to the emergence of hierarchical Bayesian modelling. It thus makes a link between the early days of Bruno de Finetti, whose work Adrian Smith translated into English, and the current research in non-parametric and robust statistics. Bernardo’s & Smith’s masterpiece, Bayesian Theory (1994), sets statistical inference within decision- and information-theoretic frameworks in a most elegant and universal manner that could be deemed a Bourbaki volume for Bayesian statistics if this classification endeavour had reached further than pure mathematics. It also emphasizes the central role of hierarchical modelling in the construction of priors, as exemplified in Carlin’s et al.‘Hierarchical Bayesian analysis of change point problems’ (1992).

The series of papers published in 1990 by Alan Gelfand & Adrian Smith, esp. ‘Sampling-Based Approaches to Calculating Marginal Densities’ (1990), is overwhelmingly perceived as the birth date of modern Markov chain Monte Carlo (MCMC) methods, as itbrought to the whole statistics community (and the quickly wider communities) the realization that MCMC simulation was the sesame to unlock complex modelling issues. The consequences on the adoption of Bayesian modelling by non-specialists are enormous and long-lasting.Similarly, Gordon’set al.‘Novel approach to nonlinear/non-Gaussian Bayesian state estimation’ (1992) is considered as the birthplace of sequential Monte Carlo, aka particle filtering, with considerable consequences in tracking, robotics, econometrics and many other fields. Titterington’s, Smith’s & Makov’s reference book, ‘Statistical Analysis of Finite Mixtures(1984)  is a precursor in the formalization of heterogeneous data structures, paving the way for the incoming MCMC resolutions like Tanner & Wong (1987), Gelman & King (1990) and Diebolt & Robert (1990). Denison et al.’s book, ‘Bayesian methods for nonlinear classification and regression’ (2002) is another testimony to the influence of Adrian Smith on the field,stressing the emergence of robust and general classification and nonlinear regression methods to analyse complex data, prefiguring in a way the later emergence of machine-learning methods,with the additional Bayesian assessment of uncertainty. It is also bringing forward the capacity of operating Bayesian non-parametric modelling that is now broadly accepted, following a series of papers by Denison et al. in the late 1990s like CART and MARS.

We are quite grateful to the authors contributing to this volume, namely Joshua J. Bon, Adam Bretherton, Katie Buchhorn, Susanna Cramb, Christopher Drovandi, Conor Hassan, Adrianne L. Jenner, Helen J. Mayfield, James M. McGree, Kerrie Mengersen, Aiden Price, Robert Salomone, Edgar Santos-Fernandez, Julie Vercelloni and Xiaoyu Wang, Afonso S. Bandeira, Antoine Maillard, Richard Nickl and Sven Wang , Fan Li, Peng Ding and Fabrizia Mealli, Matthew Stephens, Peter D. Grünwald, Sumio Watanabe, Peter Müller, Noirrit K. Chandra and Abhra Sarkar, Kori Khan and Alicia Carriquiry, Arnaud Doucet, Eric Moulines and Achille Thin, Beatrice Franzolini, Andrea Cremaschi, Willem van den Boom and Maria De Iorio, Sandra Fortini and Sonia Petrone, Sylvia Frühwirth-Schnatter, Sara Wade, Chris C. Holmes and Stephen G. Walker, Lizhen Nie and Veronika Ročková. Some of the papers are open-access, if not all, hence enjoy them!

In Bayesian statistics, data is considered nonrandom…

Posted in Books, Statistics, University life with tags , , , , , on July 12, 2021 by xi'an

A rather weird question popped up on X validated, namely why does Bayesian analysis rely on a sampling distribution if the data is nonrandom. While a given sample is is indeed a deterministic object and hence nonrandom from this perspective!, I replied that on the opposite Bayesian analysis was setting the observed data as the realisation of a random variable in order to condition upon this realisation to construct a posterior distribution on the parameter. Which is quite different from calling it nonrandom! But, presumably putting too much meaning and spending too much time on this query, I remain somewhat bemused by what line of thought led to this question…

Bayes factors revisited

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , on March 22, 2021 by xi'an

 

“Bayes factor analyses are highly sensitive to and crucially depend on prior assumptions about model parameters (…) Note that the dependency of Bayes factors on the prior goes beyond the dependency of the posterior on the prior. Importantly, for most interesting problems and models, Bayes factors cannot be computed analytically.”

Daniel J. Schad, Bruno Nicenboim, Paul-Christian Bürkner, Michael Betancourt, Shravan Vasishth have just arXived a massive document on the Bayes factor, worrying about the computation of this common tool, but also at the variability of decisions based on Bayes factors, e.g., stressing correctly that

“…we should not confuse inferences with decisions. Bayes factors provide inference on hypotheses. However, to obtain discrete decisions (…) from continuous inferences in a principled way requires utility functions. Common decision heuristics (e.g., using Bayes factor larger than 10 as a discovery threshold) do not provide a principled way to perform decisions, but are merely heuristic conventions.”

The text is long and at times meandering (at least in the sections I read), while trying a wee bit too hard to bring up the advantages of using Bayes factors versus frequentist or likelihood solutions. (The likelihood ratio being presented as a “frequentist” solution, which I think is an incorrect characterisation.) For instance, the starting point of preferring a model with a higher marginal likelihood is presented as an evidence (oops!) rather than argumented. Since this quantity depends on both the prior and the likelihood, it being high or low is impacted by both. One could then argue that using its numerical value as an absolute criterion amounts to selecting the prior a posteriori as much as checking the fit to the data! The paper also resorts to the Occam’s razor argument, which I wish we could omit, as it is a vague criterion, wide open to misappropriation. It is also qualitative, rather than quantitative, hence does not help in calibrating the Bayes factor.

Concerning the actual computation of the Bayes factor, an issue that has always been a concern and a research topic for me, the authors consider only two “very common methods”, the Savage–Dickey density ratio method and bridge sampling. We discussed the shortcomings of the Savage–Dickey density ratio method with Jean-Michel Marin about ten years ago. And while bridge sampling is an efficient approach when comparing models of the same dimension, I have reservations about this efficiency in other settings. Alternative approaches like importance nested sampling, noise contrasting estimation or SMC samplers are often performing quite efficiently as normalising constant approximations. (Not to mention our version of harmonic mean estimator with HPD support.)

Simulation-based inference is based on the notion that simulated data can be produced from the predictive distributions. Reminding me of ABC model choice to some extent. But I am uncertain this approach can be used to calibrate the decision procedure to select the most appropriate model. We thought about using this approach in our testing by mixture paper and it is favouring the more complex of the two models. This seems also to occur for the example behind Figure 5 in the paper.

Two other points: first, the paper does not consider the important issue with improper priors, which are not rigorously compatible with Bayes factors, as I discussed often in the past. And second, Bayes factors are not truly Bayesian decision procedures, since they remove the prior weights on the models, thus the mention of utility functions therein seems inappropriate unless a genuine utility function can be produced.