Archive for Bayesian robustness

David Frazier’s talk on One World ABC seminar tomorrow [watch for the time!]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on October 14, 2020 by xi'an

My friend and coauthor from Melbourne is giving the One World ABC seminar tomorrow. He will be talking at 10:30 UK time, 11:30 Brussels time, and 20:30 Melbourne time! On Robust and Efficient Approximate Bayesian Computation: A Minimum Distance Approach. Be on time!

Bayesian inference with no likelihood

Posted in Books, Statistics, University life with tags , , , , , , , , on January 28, 2020 by xi'an

This week I made a quick trip to Warwick for the defence (or viva) of the PhD thesis of Jack Jewson, containing novel perspectives on constructing Bayesian inference without likelihood or without complete trust in said likelihood. The thesis aimed at constructing minimum divergence posteriors in an M-open perspective and built a rather coherent framework from principles to implementation. There is a clear link with the earlier work of Bissiri et al. (2016), with further consistency constraints where the outcome must recover the true posterior in the M-closed scenario (if not always the case with the procedures proposed in the thesis).

Although I am partial to the use of empirical likelihoods in setting, I appreciated the position of the thesis and the discussion of the various divergences towards the posterior derivation (already discussed on this blog) , with interesting perspectives on the calibration of the pseudo-posterior à la Bissiri et al. (2016). Among other things, the thesis pointed out a departure from the likelihood principle and some of its most established consequences, like Bayesian additivity. In that regard, there were connections with generative adversarial networks (GANs) and their Bayesian versions that could have been explored. And an impression that the type of Bayesian robustness explored in the thesis has more to do with outliers than with misspecification. Epsilon-contamination amodels re quite specific as it happens, in terms of tails and other things.

The next chapter is somewhat “less” Bayesian in my view as it considers a generalised form of variational inference. I agree that the view of the posterior as a solution to an optimisation is tempting but changing the objective function makes the notion less precise.  Which makes reading it somewhat delicate as it seems to dilute the meaning of both prior and posterior to the point of becoming irrelevant.

The last chapter on change-point models is quite alluring in that it capitalises on the previous developments to analyse a fairly realistic if traditional problem, applied to traffic in London, prior and posterior to the congestion tax. However, there is always an issue with robustness and outliers in that the notion is somewhat vague or informal. Things start clarifying at the end but I find surprising that conjugates are robust optimal solutions since the usual folk theorem from the 80’s is that they are not robust.

a generalized representation of Bayesian inference

Posted in Books with tags , , , , , , on July 5, 2019 by xi'an

Jeremias Knoblauch, Jack Jewson and Theodoros Damoulas, all affiliated with Warwick (hence a potentially biased reading!), arXived a paper on loss-based Bayesian inference that Jack discussed with me on my last visit to Warwick. As I was somewhat scared by the 61 pages, of which the 8 first pages are in NeurIPS style. The authors argue for a decision-theoretic approach to Bayesian inference that involves a loss over distributions and a divergence from the prior. For instance, when using the log-score as the loss and the Kullback-Leibler divergence, the regular posterior emerges, as shown by Arnold Zellner. Variational inference also falls under this hat. The argument for this generalization is that any form of loss can be used and still returns a distribution that is used to assess uncertainty about the parameter (of interest). In the axioms they produce for justifying the derivation of the optimal procedure, including cases where the posterior is restricted to a certain class, one [Axiom 4] generalizes the likelihood principle. Given the freedom brought by this general framework, plenty of fringe Bayes methods like standard variational Bayes can be seen as solutions to such a decision problem. Others like EP do not. Of interest to me are the potentials for this formal framework to encompass misspecification and likelihood-free settings, as well as for assessing priors, which is always a fishy issue. (The authors mention in addition the capacity to build related specific design Bayesian deep networks, of which I know nothing.) The obvious reaction of mine is one of facing an abundance of wealth (!) but encompassing approximate Bayesian solutions within a Bayesian framework remains an exciting prospect.

nonparametric Bayesian clay for robust decision bricks

Posted in Statistics with tags , , , , , , on January 30, 2017 by xi'an

Just received an email today that our discussion with Judith of Chris Holmes and James Watson’s paper was now published as Statistical Science 2016, Vol. 31, No. 4, 506-510… While it is almost identical to the arXiv version, it can be read on-line.

non-identifiability in Venezia

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , on November 2, 2016 by xi'an

Last Wednesday, I attended a seminar by T. Kitagawa at the economics seminar of the University Ca’ Foscari, in Venice, which was about (uncertain) identifiability and a sort of meta-Bayesian approach to the problem. Just to give an intuition about the setting, a toy example is a simultaneous equation model Ax=ξ, where x and ξ are two-dimensional vectors, ξ being a standard bivariate Normal noise. In that case, A is not completely identifiable. The argument in the talk (and the paper) is that the common Bayesian answer that sets a prior on the non-identifiable part (which is an orthogonal matrix in the current setting) is debatable as it impacts inference on the non-identifiable parts, even in the long run. Which seems fine from my viewpoint. The authors propose to instead consider the range of possible priors that are compatible with the set restrictions on the non-identifiable parts and to introduce a mixture between a regular prior on the whole parameter A and this collection of priors, which can be seen as a set-valued prior although this does not fit within the Bayesian framework in my opinion. Once this mixture is constructed, a formal posterior weight on the regular prior can be derived. As well as a range of posterior values for all quantities of interest. While this approach connects with imprecise probabilities à la Walley (?) and links with robust Bayesian studies of the 1980’s, I always have difficulties with the global setting of such models, which do not come under criticism while being inadequate. (Of course, there are many more things I do not understand in econometrics!)