PAC-Bayesians

Yesterday, I took part in the thesis defence of James Ridgway [soon to move to the University of Bristol[ at Université Paris-Dauphine. While I have already commented on his joint paper with Nicolas on the Pima Indians, I had not read in any depth another paper in the thesis, “On the properties of variational approximations of Gibbs posteriors” written jointly with Pierre Alquier and Nicolas Chopin.

PAC stands for probably approximately correct and starts with an empirical form of posterior, called the Gibbs posterior, where the log-likelihood is replaced with an empirical error

\pi(\theta|x_1,\ldots,x_n) \propto \exp\{-\lambda r_n(\theta)\}\pi(\theta)

that is rescaled by a factor λ. Factor that is called the learning rate, to be optimised as the (Kullback) closest  approximation to the true unknown distribution, by Peter Grünwald (2012) in his SafeBayes approach. In the paper of James, Pierre and Nicolas, there is no visible Bayesian perspective, since the pseudo-posterior is used to define a randomised estimator that achieves optimal oracle bounds. When λ is of order n. The purpose of the paper is rather to produce an efficient approximation to the Gibbs posterior, by using variational Bayes techniques. And to derive point estimators. With the added appeal that the approximation also achieves the oracle bounds. (Surprisingly, the authors do not leave the Pima Indians alone as they use this benchmark for a ranking model.) Since there is no discussion on the choice of the learning rate λ, as opposed to Bissiri et al. (2013) I discussed around Bayes.250, I have difficulties perceiving the possible impact of this representation on Bayesian analysis. Except maybe as an ABC device, as suggested by Christophe Andrieu.

2 Responses to “PAC-Bayesians”

  1. I really liked this paper, mainly because VB is amazing for finding the centre of a complex posterior and that’s all that is needed for PAC-Bayes.

    I agree with you that it doesn’t say anything about the impact on Bayesian analysis, but I dont’ think that’s a downside. PAC-Bayes is explicitly trying to find just one thing (rather than the everythign that Bayes aims for), so they’re not compatible ideologies.

  2. James Ridgway Says:

    Thanks xi’an for discussing the paper. I would like to add that the goal
    of the paper is to show that variational approximation of Gibbs posteriors can achieve the same rate of convergence as the posterior itself (and to show the conditions under which it does).
    As it is discussed in the paper we use cross-validation to choose \lambda as it remains the “go to” method for general notions of risks…Also note that PAC methodology has origins in papers way before [Bissiri et al 2013], in particular see [Shawe-Taylor and Williamson, 1997,
    McAllester, 1998, Catoni, 2004].

    Concerning the pima indians, they do not appear as a sole justification
    for the algorithms (as in some papers …). They are also tested on additional datasets with more covariates (in the case of classification) or more individuals when this is the computational issue (as for AUC ranking). Also note that the Pima indians is a very noisy dataset making it trivial to sample from the posterior (probit/logit likelihood) but relativelly hard as a classification task.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: