Archive for PAC-Bayesian

ISBA 18 tidbits

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 2, 2018 by xi'an

Among a continuous sequence of appealing sessions at this ISBA 2018 meeting [says a member of the scientific committee!], I happened to attend two talks [with a wee bit of overlap] by Sid Chib in two consecutive sessions, because his co-author Ana Simoni (CREST) was unfortunately sick. Their work was about models defined by a collection of moment conditions, as often happens in econometrics, developed in a recent JASA paper by Chib, Shin, and Simoni (2017). With an extension about moving to defining conditional expectations by use of a functional basis. The main approach relies on exponentially tilted empirical likelihoods, which reminded me of the empirical likelihood [BCel] implementation we ran with Kerrie Mengersen and Pierre Pudlo a few years ago. As a substitute to ABC. This problematic made me wonder on how much Bayesian the estimating equation concept is, as it should somewhat involve a nonparametric prior under the moment constraints.

Note that Sid’s [talks and] papers are disconnected from ABC, as everything comes in closed form, apart from the empirical likelihood derivation, as we actually found in our own work!, but this could become a substitute model for ABC uses. For instance, identifying the parameter θ of the model by identifying equations. Would that impose too much input from the modeller? I figure I came with this notion mostly because of the emphasis on proxy models the previous day at ABC in ‘burgh! Another connected item of interest in the work is the possibility of accounting for misspecification of these moment conditions by introducing a vector of errors with a spike & slab distribution, although I am not sure this is 100% necessary without getting further into the paper(s) [blame conference pressure on my time].

Another highlight was attending a fantastic poster session Monday night on computational methods except I would have needed four more hours to get through every and all posters. This new version of ISBA has split the posters between two sites (great) and themes (not so great!), while I would have preferred more sites covering all themes over all nights, to lower the noise (still bearable this year) and to increase the possibility to check all posters of interest in a particular theme…

Mentioning as well a great talk by Dan Roy about assessing deep learning performances by what he calls non-vacuous error bounds. Namely, through PAC-Bayesian bounds. One major comment of his was about deep learning models being much more non-parametric (number of parameters rising with number of observations) than parametric models, meaning that generative adversarial constructs as the one I discussed a few days ago may face a fundamental difficulty as models are taken at face value there.

On closed-form solutions, a closed-form Bayes factor for component selection in mixture models by Fũqene, Steel and Rossell that resemble the Savage-Dickey version, without the measure theoretic difficulties. But with non-local priors. And closed-form conjugate priors for the probit regression model, using unified skew-normal priors, as exhibited by Daniele Durante. Which are product of Normal cdfs and pdfs, and which allow for closed form marginal likelihoods and marginal posteriors as well. (The approach is not exactly conjugate as the prior and the posterior are not in the same family.)

And on the final session I attended, there were two talks on scalable MCMC, one on coresets, which will require some time and effort to assimilate, by Trevor Campbell and Tamara Broderick, and another one using Poisson subsampling. By Matias Quiroz and co-authors. Which did not completely convinced me (but this was the end of a long day…)

All in all, this has been a great edition of the ISBA meetings, if quite intense due to a non-stop schedule, with a very efficient organisation that made parallel sessions manageable and poster sessions back to a reasonable scale [although I did not once manage to cross the street to the other session]. Being in unreasonably sunny Edinburgh helped a lot obviously! I am a wee bit disappointed that no one else follows my call to wear a kilt, but I had low expectations to start with… And too bad I missed the Ironman 70.3 Edinburgh by one day!

Journée algorithmes stochastiques

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on September 27, 2017 by xi'an

On December 1, 2017, we will hold a day workshop on stochastic algorithms at Université Paris-Dauphine, with the following speakers

 Details and abstracts of the talks are available on the workshop webpage. Attendance is free, but registration is requested towards planning the morning and afternoon coffee breaks. Looking forward seeing ‘Og’s readers there, at least those in the vicinity!

And while I am targetting Parisians, crypto-Bayesians, and nearly-Parisians, there is another day workshop on Bayesian and PAC-Bayesian methods on November 16, at Université Pierre et Marie Curie (campus Jussieu), with invited speakers

and a similar request for (free) registration.

ISBA 2016 [#6]

Posted in Kids, Mountains, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on June 19, 2016 by xi'an

Fifth and final day of ISBA 2016, which was as full and intense as the previous ones. (Or even more if taking into account the late evening social activities pursued by most participants.) First thing in the morning, I managed to get very close to a hill top, thanks to the hints provided by Jeff Miller!, and with no further scratches from the nasty local thorn bushes. And I was back with plenty of time for a Bayesian robustness session with great talks. (Session organised by Judith Rousseau whom I crossed while running, rushing to the airport thanks to an Air France last-minute cancellation.) First talk by James Watson (on his paper with Chris Holmes on Kullback neighbourhoods on priors that Judith and I discussed recently in Statistical Science). Then as a contrapunto Peter Grünwald gave a neat geometric motivation for possible misbehaviour of Bayesian inference in non-convex misspecified environments and discussed his SafeBayes resolution that weights down the likelihood. In a sort of PAC-Bayesian way. And Erlis Ruli presented the ABC-R approach he developed with Laura Ventura and Nicola Sartori based on M-estimators and score functions. Making wonder [idly, as usual] whether cumulating different M-estimators would make a difference in the performances of the ABC algorithm.

David Dunson delivered one of the plenary lectures on high-dimensional discrete parameter estimation, including for instance categorical data. This wide-range talk covered many aspects and papers of David’s work, including a use of tensors I had neither seen nor heard of before before. With sparse modelling to resist the combinatoric explosion of contingency tables. However, and you may blame my Gallic pessimistic daemon for this remark, I have trouble to picture the meaning and relevance of a joint distribution on a space of hundreds and hundreds of dimension and similarly the ability to check the adequacy of any modelling in terms of goodness of fit. For instance, to borrow a non-military example from David’s talk, handling genetic data on ACGT sequences to infer its distribution sounds unreasonable unless most of the bases are mono-allelic. And the only way I see to test the realism of a model in this framework would be to engineer realisations of this distribution to observe the outcome, a test that seems neither feasible not desirable. Prediction based on such models may obviously operate satisfactorily without such realism requirements.

My first afternoon session (after the ISBA assembly that announced the location of ISBA 2020 in Yunnan, China!, home of Pu’ Ehr tea) was about accelerated MCMC schemes with talks by Sanvesh Srivastava on divide-and-conquer MCMC using Wasserstein barycentres, already discussed here, Minsuk Shin on a faster stochastic search variable selection which I could not understand, and Alex Beskos on the extension of Giles’ multilevel Monte Carlo to MCMC settings, which sounded worth investigating further even though I did not follow the notion all the way through. After listening to Luke Bornn explaining how to recalibrate grid data for climate science by accounting for correlation (with the fun title of `lost moments’), I rushed to my rental to [help] cook dinner for friends and… the ISBA 2016 conference was over!

PAC-Bayesians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , on September 22, 2015 by xi'an

Yesterday, I took part in the thesis defence of James Ridgway [soon to move to the University of Bristol[ at Université Paris-Dauphine. While I have already commented on his joint paper with Nicolas on the Pima Indians, I had not read in any depth another paper in the thesis, “On the properties of variational approximations of Gibbs posteriors” written jointly with Pierre Alquier and Nicolas Chopin.

PAC stands for probably approximately correct and starts with an empirical form of posterior, called the Gibbs posterior, where the log-likelihood is replaced with an empirical error

\pi(\theta|x_1,\ldots,x_n) \propto \exp\{-\lambda r_n(\theta)\}\pi(\theta)

that is rescaled by a factor λ. Factor that is called the learning rate, to be optimised as the (Kullback) closest  approximation to the true unknown distribution, by Peter Grünwald (2012) in his SafeBayes approach. In the paper of James, Pierre and Nicolas, there is no visible Bayesian perspective, since the pseudo-posterior is used to define a randomised estimator that achieves optimal oracle bounds. When λ is of order n. The purpose of the paper is rather to produce an efficient approximation to the Gibbs posterior, by using variational Bayes techniques. And to derive point estimators. With the added appeal that the approximation also achieves the oracle bounds. (Surprisingly, the authors do not leave the Pima Indians alone as they use this benchmark for a ranking model.) Since there is no discussion on the choice of the learning rate λ, as opposed to Bissiri et al. (2013) I discussed around Bayes.250, I have difficulties perceiving the possible impact of this representation on Bayesian analysis. Except maybe as an ABC device, as suggested by Christophe Andrieu.