Next week, I will be in Harvard Monday and Tuesday, visiting friends in the Department of Statistics and giving a seminar. The slides for the talk will be quite similar to those of my talk in Bristol, a few weeks ago. Hopefully, there will not be too much overlap between both audiences! And hopefully I’ll manage to get to my conclusion before all hell breaks loose (which is why I strategically set my conclusion in the early slides!)
Archive for University of Bristol
I went to give a seminar in Bristol last Friday and I chose to present the testing with mixture paper. As we are busy working on the revision, I was eagerly looking for comments and criticisms that could strengthen this new version. As it happened, the (Bristol) Bayesian Cake (Reading) Club had chosen our paper for discussion, two weeks in a row!, hence the title!, and I got invited to join the group the morning prior to the seminar! This was, of course, most enjoyable and relaxed, including an home-made cake!, but also quite helpful in assessing our arguments in the paper. One point of contention or at least of discussion was the common parametrisation between the components of the mixture. Although all parametrisations are equivalent from a single component point of view, I can [almost] see why using a mixture with the same parameter value on all components may impose some unsuspected constraint on that parameter. Even when the parameter is the same moment for both components. This still sounds like a minor counterpoint in that the weight should converge to either zero or one and hence eventually favour the posterior on the parameter corresponding to the “true” model.
Another point that was raised during the discussion is the behaviour of the method under misspecification or for an M-open framework: when neither model is correct does the weight still converge to the boundary associated with the closest model (as I believe) or does a convexity argument produce a non-zero weight as it limit (as hinted by one example in the paper)? I had thought very little about this and hence had just as little to argue though as this does not sound to me like the primary reason for conducting tests. Especially in a Bayesian framework. If one is uncertain about both models to be compared, one should have an alternative at the ready! Or use a non-parametric version, which is a direction we need to explore deeper before deciding it is coherent and convergent!
A third point of discussion was my argument that mixtures allow us to rely on the same parameter and hence the same prior, whether proper or not, while Bayes factors are less clearly open to this interpretation. This was not uniformly accepted!
Thinking afresh about this approach also led me to broaden my perspective on the use of the posterior distribution of the weight(s) α: while previously I had taken those weights mostly as a proxy to the posterior probabilities, to be calibrated by pseudo-data experiments, as for instance in Figure 9, I now perceive them primarily as the portion of the data in agreement with the corresponding model [or hypothesis] and more importantly as a solution for staying away from a Neyman-Pearson-like decision. Or error evaluation. Usually, when asked about the interpretation of the output, my answer is to compare the behaviour of the posterior on the weight(s) with a posterior associated with a sample from each model. Which does sound somewhat similar to posterior predictives if the samples are simulated from the associated predictives. But the issue was not raised during the visit to Bristol, which possibly reflects on how unfrequentist the audience was [the Statistics group is], as it apparently accepted with no further ado the use of a posterior distribution as a soft assessment of the comparative fits of the different models. If not necessarily agreeing the need of conducting hypothesis testing (especially in the case of the Pima Indian dataset!).
Yesterday, I took part in the thesis defence of James Ridgway [soon to move to the University of Bristol[ at Université Paris-Dauphine. While I have already commented on his joint paper with Nicolas on the Pima Indians, I had not read in any depth another paper in the thesis, “On the properties of variational approximations of Gibbs posteriors” written jointly with Pierre Alquier and Nicolas Chopin.
PAC stands for probably approximately correct and starts with an empirical form of posterior, called the Gibbs posterior, where the log-likelihood is replaced with an empirical error
that is rescaled by a factor λ. Factor that is called the learning rate, to be optimised as the (Kullback) closest approximation to the true unknown distribution, by Peter Grünwald (2012) in his SafeBayes approach. In the paper of James, Pierre and Nicolas, there is no visible Bayesian perspective, since the pseudo-posterior is used to define a randomised estimator that achieves optimal oracle bounds. When λ is of order n. The purpose of the paper is rather to produce an efficient approximation to the Gibbs posterior, by using variational Bayes techniques. And to derive point estimators. With the added appeal that the approximation also achieves the oracle bounds. (Surprisingly, the authors do not leave the Pima Indians alone as they use this benchmark for a ranking model.) Since there is no discussion on the choice of the learning rate λ, as opposed to Bissiri et al. (2013) I discussed around Bayes.250, I have difficulties perceiving the possible impact of this representation on Bayesian analysis. Except maybe as an ABC device, as suggested by Christophe Andrieu.
With my friends Peter Green (Bristol), Krzysztof Łatuszyński (Warwick) and Marcello Pereyra (Bristol), we just arXived the first version of “Bayesian computation: a perspective on the current state, and sampling backwards and forwards”, which first title was the title of this post. This is a survey of our own perspective on Bayesian computation, from what occurred in the last 25 years [a lot!] to what could occur in the near future [a lot as well!]. Submitted to Statistics and Computing towards the special 25th anniversary issue, as announced in an earlier post.. Pulling strength and breadth from each other’s opinion, we have certainly attained more than the sum of our initial respective contributions, but we are welcoming comments about bits and pieces of importance that we miss and even more about promising new directions that are not posted in this survey. (A warning that is should go with most of my surveys is that my input in this paper will not differ by a large margin from ideas expressed here or in previous surveys.)
Patrick Rubin-Delanchy and Daniel Lawson [of Warhammer fame!] recently arXived a paper we had discussed with Patrick when he visited Andrew and I last summer in Paris. The topic is the evaluation of the posterior predictive probability of a larger discrepancy between data and model
which acts like a Bayesian p-value of sorts. I discussed several times the reservations I have about this notion on this blog… Including running one experiment on the uniformity of the ppp while in Duke last year. One item of those reservations being that it evaluates the posterior probability of an event that does not exist a priori. Which is somewhat connected to the issue of using the data “twice”.
“A posterior predictive p-value has a transparent Bayesian interpretation.”
Another item that was suggested [to me] in the current paper is the difficulty in defining the posterior predictive (pp), for instance by including latent variables
which reminds me of the multiple possible avatars of the BIC criterion. The question addressed by Rubin-Delanchy and Lawson is how far from the uniform distribution stands this pp when the model is correct. The main result of their paper is that any sub-uniform distribution can be expressed as a particular posterior predictive. The authors also exhibit the distribution that achieves the bound produced by Xiao-Li Meng, Namely that
where P is the above (top) probability. (Hence it is uniform up to a factor 2!) Obviously, the proximity with the upper bound only occurs in a limited number of cases that do not validate the overall use of the ppp. But this is certainly a nice piece of theoretical work.
A new EPSRC programme grant, called i-like, has been awarded to researchers in Bristol, Lancaster, Oxford, and Warwick, to conduct research on intractable likelihoods. (I am also associated to this program as a [grateful] collaborator.) This covers several areas of statistics, like big data and inference on stochastic process, but my own primary interest in the programme is of course the possibilities to conduct collaboration on ABC and composite likelihood methods. (Great website design, by the way!)
A first announcement is that there will be a half-day launch in Oxford on January 31, 2013, which program is now available. Followed by a workshop in mid-May in Warwick (to which I will participate). This event is particularly aimed at PhD students and early-career researchers. The second announcement is that the EPSRC programme grant provides funding for five postdoctoral positions over a duration of four years, which is of course stupendous! So if you like i-like as much as I like it, and are a new researcher looking for opportunities in exciting areas, you should definitely consider applying!
This was a fairly full day at the Structure and uncertainty modelling, inference and computation in complex stochastic systems workshop! After a good one hour run around the Clifton Down, the morning was organised around likelihood-free methods, mostly ABC, plus Arnaud Doucet’s study of methods based on unbiased estimators of the likelihood (à la Beaumont, with the novelty of assessing the inefficiency due to the estimation, really fascinating..). The afternoon was dedicated to graphical models. Nicolas Chopin gave an updated version of his Kyoto talk on EP-ABC where he resorted to composite likelihoods for hidden Markov models, (I then wondered about the parameterisation and the tolerance determination for this algorithm.) Oliver Ratman presented some of the work he did on the flu while in Duke, then move to a new approach for ABC tolerance based on various kinds of testing (which I found clearer than in Kyoto, maybe because I was not jet-lagged!) And I gave my talk on ABC-EL.I found the afternoon session harder to follow, mostly because I always have trouble understanding the motivations and the notations used on these models, albeit fascinating. I remained intrigued by the bidirectional dependence arrow in those graphs for the whole afternoon (even though I think I get it now!) After looking at the few posters presented this afternoon, I went for another short run in Leigh Woods, before joining a group of friends for an Indian dinner at the Brunel Raj. A very full day…!