A final day for this O’Bayes 2013 conference, where I missed the final session for travelling reasons. Several talks had highly attractive features (for me), from David Dunson’s on his recently arXived paper on parallel MCMC, that provides an alternative to the embarrassingly parallel algorithm I discussed a few weeks ago, to be discussed further in a future post, to Marty Wells hindered by poor weather and delivering by phone a talk on L1 shrinkage estimators (a bit of a paradox since, as discussed by Yuzo Maruyama, most MAP estimators cannot be minimax and, more broadly, since they cannot be expressed as resolutions of loss minimisation), to Malay Ghosh revisiting g-priors from an almost frequentist viewpoint, to Gonzalo Garci-Donato presenting criteria for objective Bayesian model choice in a vision that was clearly the closest to my own perspective on the topic. Overall, when reflecting upon the diversity and high quality of the talks at this O’Bayes meeting, and also as the incoming chair-elect of the corresponding section of ISBA, I think that what emerges most significantly from those talks is an ongoing pondering on the nature of (objective Bayesian) testing, not only in the works extending the g-priors in various directions, but also in the whole debate between Bayes factors and information criteria, model averaging versus model selection. During the discussion on Gonzalo’s talk, David Draper objected to the search for an automated approach to the comparison of models, but I strongly lean towards Gonzalo’s perspective as we need to provide a reference solution able to tackle less formal and more realistic problems. I do hope to see more of those realistic problems tackled at O’Bayes 2015 (which location is not yet settled). In the meanwhile, a strong thank you! to the local organising committee and most specifically to Jim Berger!
Archive for hyper-g-prior
Another day full of interesting and challenging—in the sense they generated new questions for me—talks at the SuSTain workshop. After another (dry and fast) run around the Downs; Leo Held started the talks with one of my favourite topics, namely the theory of g-priors in generalized linear models. He did bring a new perspective on the subject, introducing the notion of a testing Bayes factor based on the residual statistic produced by a classical (maximum likelihood) analysis, connected with earlier works of Vale Johnson. While I did not truly get the motivation for switching from the original data to this less informative quantity, I find this perspective opening new questions for dealing with settings where the true data is replaced with one or several classical statistics. With possible strong connections to ABC, of course. Incidentally, Leo managed to produce a napkin with Peter Green’s intro to MCMC dating back from their first meeting in 1994: a feat I certainly could not reproduce (as I also met both Peter and Leo for the first time in 1994, at CIRM)… Then Richard Everit presented his recent JCGS paper on Bayesian inference on latent Markov random fields, centred on the issue that simulating the latent MRF involves an MCMC step that is not exact (as in our earlier ABC paper for Ising models with Aude Grelaud). I already discussed this paper in an earlier blog and the only additional question that comes to my mind is whether or not a comparison with the auxiliary variable approach of Møller et al. (2006) would make sense.
In the intermission, I had a great conversation with Oliver Ratman on his talk of yesterday on the surprising feature that some models produce as “data” some sample from a pseudo-posterior.. Opening once again new vistas! The following talks were more on the mathematical side, with James Cussens focussing on the use of integer programming for Bayesian variable selections, then Éric Moulines presenting a recent work with a PhD student of his on PAC-Bayesian bounds and the superiority of combining experts. Including a CRAN package. Éric concluded his talk with the funny occurence of Peter’s photograph on Éric’s Microsoft Research Profile own page, due to Éric posting our joint photograph at the top of Pic du Midi d’Ossau in 2005… (He concluded with a picture of the mountain that was the exact symmetry of mine yesterday!)
The afternoon was equally superb with Gareth Roberts covering fifteen years of scaling MCMC algorithms, from the mythical 0.234 figure to the optimal temperature decrease in simulated annealing, John Kent playing the outlier with an EM algorithm—however including a formal prior distribution and raising the challenge as to why Bayesians never had to constrain the posterior expectation, which prompted me to infer that (a) the prior distribution should include all constraints and (b) the posterior expectation was not the “right” tool in non-convex parameters spaces—. Natalia Bochkina presented a recent work, joint with Peter Green, on connecting image analysis with Bayesian asymptotics, reminding me of my early attempts at reading Ibragimov and Has’minskii in the 1990′s. Then a second work with Vladimir Spoikoini on Bayesian asymptotics with misspecified models, introducing a new notion of effective dimension. The last talk of the day was by Nils Hjort about his coming book on “Credibility, confidence and likelihood“—not yet advertised by CUP—which sounds like an attempt at resuscitating Fisher by deriving distributions in the parameter space from frequentist confidence intervals. I already discussed this notion in an earlier blog, so I am fairly skeptical about it, but the talk was representative of Nils’ highly entertaining and though-provoking style! Esp. as he sprinkled the talk with examples where MLE (and some default Bayes estimators) did not work. And reanalysed one of Chris Sims‘ example presented during his Nobel Prize talk…
After a huge delay, since the project started in 2006 and was first presented in Banff in 2007 (as well as included in the Bayesian Core), Gilles Celeux, Mohammed El Anbari, Jean-Michel Marin, and myself have eventually completed our paper on using hyper-g priors variable selection and regularisation in linear models . The redaction of this paper was mostly delayed due to the publication of the 2007 JASA paper by Feng Liang, Rui Paulo, German Molina, Jim Berger, and Merlise Clyde, Mixtures of g-priors for Bayesian variable selection. We had indeed (independently) obtained very similar derivations based on hypergeometric function representations but, once the above paper was published, we needed to add material to our derivation and chose to run a comparison study between Bayesian and non-Bayesian methods for a series of simulated and true examples. It took a while to Mohammed El Anbari to complete this simulation study and even longer for the four of us to convene and agree on the presentation of the paper. The only difference between Liang et al.’s (2007) modelling and ours is that we do not distinguish between the intercept and the other regression coefficients in the linear model. On the one hand, this gives us one degree of freedom that allows us to pick an improper prior on the variance parameter. On the other hand, our posterior distribution is not invariant under location transforms, which was a point we heavily debated in Banff… The simulation part shows that all “standard” Bayesian solutions lead to very similar decisions and that they are much more parsimonious than regularisation techniques.
Two other papers posted on arXiv today address the model choice issue. The first one by Bruce Lindsay and Jiawei Liu introduces a credibility index, and the second one by Bazerque, Mateos, and Giannakis considers group-lasso on splines for spectrum cartography.