## PAC-Bayesians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , on September 22, 2015 by xi'an

Yesterday, I took part in the thesis defence of James Ridgway [soon to move to the University of Bristol[ at Université Paris-Dauphine. While I have already commented on his joint paper with Nicolas on the Pima Indians, I had not read in any depth another paper in the thesis, “On the properties of variational approximations of Gibbs posteriors” written jointly with Pierre Alquier and Nicolas Chopin.

PAC stands for probably approximately correct and starts with an empirical form of posterior, called the Gibbs posterior, where the log-likelihood is replaced with an empirical error

$\pi(\theta|x_1,\ldots,x_n) \propto \exp\{-\lambda r_n(\theta)\}\pi(\theta)$

that is rescaled by a factor λ. Factor that is called the learning rate, to be optimised as the (Kullback) closest  approximation to the true unknown distribution, by Peter Grünwald (2012) in his SafeBayes approach. In the paper of James, Pierre and Nicolas, there is no visible Bayesian perspective, since the pseudo-posterior is used to define a randomised estimator that achieves optimal oracle bounds. When λ is of order n. The purpose of the paper is rather to produce an efficient approximation to the Gibbs posterior, by using variational Bayes techniques. And to derive point estimators. With the added appeal that the approximation also achieves the oracle bounds. (Surprisingly, the authors do not leave the Pima Indians alone as they use this benchmark for a ranking model.) Since there is no discussion on the choice of the learning rate λ, as opposed to Bissiri et al. (2013) I discussed around Bayes.250, I have difficulties perceiving the possible impact of this representation on Bayesian analysis. Except maybe as an ABC device, as suggested by Christophe Andrieu.

## Bayesian computation: fore and aft

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on February 6, 2015 by xi'an

With my friends Peter Green (Bristol), Krzysztof Łatuszyński (Warwick) and Marcello Pereyra (Bristol), we just arXived the first version of “Bayesian computation: a perspective on the current state, and sampling backwards and forwards”, which first title was the title of this post. This is a survey of our own perspective on Bayesian computation, from what occurred in the last 25 years [a  lot!] to what could occur in the near future [a lot as well!]. Submitted to Statistics and Computing towards the special 25th anniversary issue, as announced in an earlier post.. Pulling strength and breadth from each other’s opinion, we have certainly attained more than the sum of our initial respective contributions, but we are welcoming comments about bits and pieces of importance that we miss and even more about promising new directions that are not posted in this survey. (A warning that is should go with most of my surveys is that my input in this paper will not differ by a large margin from ideas expressed here or in previous surveys.)

## Posterior predictive p-values and the convex order

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 22, 2014 by xi'an

Patrick Rubin-Delanchy and Daniel Lawson [of Warhammer fame!] recently arXived a paper we had discussed with Patrick when he visited Andrew and I last summer in Paris. The topic is the evaluation of the posterior predictive probability of a larger discrepancy between data and model

$\mathbb{P}\left( f(X|\theta)\ge f(x^\text{obs}|\theta) \,|\,x^\text{obs} \right)$

which acts like a Bayesian p-value of sorts. I discussed several times the reservations I have about this notion on this blog… Including running one experiment on the uniformity of the ppp while in Duke last year. One item of those reservations being that it evaluates the posterior probability of an event that does not exist a priori. Which is somewhat connected to the issue of using the data “twice”.

“A posterior predictive p-value has a transparent Bayesian interpretation.”

Another item that was suggested [to me] in the current paper is the difficulty in defining the posterior predictive (pp), for instance by including latent variables

$\mathbb{P}\left( f(X,Z|\theta)\ge f(x^\text{obs},Z^\text{obs}|\theta) \,|\,x^\text{obs} \right)\,,$

which reminds me of the multiple possible avatars of the BIC criterion. The question addressed by Rubin-Delanchy and Lawson is how far from the uniform distribution stands this pp when the model is correct. The main result of their paper is that any sub-uniform distribution can be expressed as a particular posterior predictive. The authors also exhibit the distribution that achieves the bound produced by Xiao-Li Meng, Namely that

$\mathbb{P}(P\le \alpha) \le 2\alpha$

where P is the above (top) probability. (Hence it is uniform up to a factor 2!) Obviously, the proximity with the upper bound only occurs in a limited number of cases that do not validate the overall use of the ppp. But this is certainly a nice piece of theoretical work.

## I like…intractable likelihoods (openings)

Posted in Statistics with tags , , , , , , , , , on December 22, 2012 by xi'an

A new EPSRC programme grant, called i-like, has been awarded to researchers in Bristol, Lancaster, Oxford, and Warwick, to conduct research on intractable likelihoods. (I am also associated to this program as a [grateful] collaborator.) This covers several areas of statistics, like big data and inference on stochastic process, but my own primary interest in the programme is of course the possibilities to conduct collaboration on ABC and composite likelihood methods. (Great website design, by the way!)

A first announcement is that there will be a half-day launch in Oxford on January 31, 2013, which program is now available. Followed by a workshop in mid-May in Warwick (to which I will participate). This event is particularly aimed at PhD students and early-career researchers. The second announcement is that the EPSRC programme grant provides funding for five postdoctoral positions over a duration of four years, which is of course stupendous! So if you like i-like as much as I like it, and are a new researcher looking for opportunities in exciting areas, you should definitely consider applying!

## Structure and uncertainty, Bristol, Sept. 25

Posted in pictures, Running, Statistics, Travel, Uncategorized, University life with tags , , , , , , , , , on September 26, 2012 by xi'an

This was a fairly full day at the Structure and uncertainty modelling, inference and computation in complex stochastic systems workshop! After a good one hour run around the Clifton Down, the morning was organised around likelihood-free methods, mostly ABC, plus Arnaud Doucet’s study of methods based on unbiased estimators of the likelihood (à la Beaumont, with the novelty of assessing the inefficiency due to the estimation, really fascinating..). The afternoon was dedicated to graphical models. Nicolas Chopin gave an updated version of his Kyoto talk on EP-ABC where he resorted to composite likelihoods for hidden Markov models, (I then wondered about the parameterisation and the tolerance determination for this algorithm.) Oliver Ratman presented some of the work he did on the flu while in Duke, then move to a new approach for ABC tolerance based on various kinds of testing (which I found clearer than in Kyoto, maybe because I was not jet-lagged!) And I gave my talk on ABC-EL.I found the afternoon session harder to follow, mostly because I always have trouble understanding the motivations and the notations used on these models, albeit fascinating. I remained intrigued by the bidirectional dependence arrow in those graphs for the whole afternoon (even though I think I get it now!) After looking at the few posters presented this afternoon, I went for another short run in Leigh Woods, before joining a group of friends for an Indian dinner at the Brunel Raj. A very full day…!