Great news! The RSS is setting a data analysis challenge this year, sponsored by the Young Statisticians Section and Research Section of the Royal Statistical Society: Details are available on the wordpress website of the Challenge. Registration is open and the Challenge goes live on Tuesday 6 May 2014 for an exciting 6 weeks competition. (A wee bit of an unfortunate timing for those of us considering submitting a paper to NIPS!) Truly terrific, I have been looking for this kind of event to happen for many years (without finding the momentum to set it rolling…) and hope it will generate a lot of exciting activity and replicas in other societies.
Archive for RSS
Last evening, I attended the RSS Midlands seminar here in Warwick. The theme was chain event graphs (CEG), As I knew nothing about them, it was worth my time listening to both speakers and discussing with Jim Smith afterwards. CEGs are extensions of Bayes nets with originally many more nodes since they start with the probability tree involving all modalities of all variables. Intensive Bayesian model comparison is then used to reduce the number of nodes by merging modalities having the same children or removing variables with no impact on the variable of interest. So this is not exactly a new Bayes net based on modality dummies as nodes (my original question). This is quite interesting, esp. in the first talk illustration of using missing value indicators as a supplementary variable (to determine whether or not data is missing at random). I also wonder how much of a connection there is with variable length Markov chains (either as a model or as a way to prune the tree). A last vague idea is a potential connection with lumpable Markov chains, a concept I learned from Kemeny & Snell (1960): a finite Markov chain is lumpable if by merging two or more of its states it remains a Markov chain. I do not know if this has ever been studied from a statistical point of view, i.e. testing for lumpability, but this sounds related to the idea of merging modalities of some variables in the probability tree…
Although I could not stay at the RSS Annual Conference for the three days, I would have liked to do so, as there were several interesting sessions, from MCMC talks by Axel Finke, Din-Houn Lau, Anthony Lee and Michael Betancourt, to the session on Anti-fragility, the concept produced by Nassim Taleb in his latest book (reviewed before completion by Larry Wasserman). I find it rather surprising that the RSS is dedicating a whole session to this, but the usually anti-statistic stance of Taleb (esp. in The Black Swan) may explain for it (and the equally surprising debate between a “pro-Taleb” and a “pro-Silver”. I will also miss Sharon McGrayne‘s talk on the Bayesian revolution, but look forward to hear it at the Bayes-250 day in Duke next December. And I could have certainly benefited from the training session about building a package in R. It seemed, however, that one-day attendance was a choice made by many participants to the conference, judging from the ability to register for one or two days and from the (biased) sample of my friends.
Incidentally, the conference gave me the opportunity to discover Newcastle and Tynemouth, enjoying the architecture of Grey Street and running on the huge meadows almost at the city centre, among herds of cows in the morning fog. (I wish I had had more time to reach the neighbourly Hadrian wall and Durham, that I only spotted from the train to B’ham!)
Today, I attended the RSS Annual Conference in Newcastle-upon-Tyne. For one thing, I ran a Memorial session in memory of George Casella, with my (and his) friends Jim Hobert and Elias Moreno as speakers. (The session was well-attended if not overwhelmingly so.) For another thing, the RSS decided to have the DIC Read Paper by David Spiegelhalter, Nicky Best, Brad Carlin and Angelika van der Linde Bayesian measures of model complexity and fit re-Read, and I was asked to re-discuss the 2002 paper. Here are the slides of my discussion, borrowing from the 2006 Bayesian Analysis paper with Gilles Celeux, Florence Forbes, and Mike Titterington where we examined eight different versions of DIC for mixture models. (I refrained from using the title “snow white and the seven DICs” for a slide…) I also borrowed from our recent discussion of Murray Aitkin’s (2009) book. The other discussant was Elias Moreno, who focussed on consistency issues. (More on this and David Spiegelhalter’s defence in a few posts!) This was the first time I was giving a talk on a basketball court (I once gave an exam there!)
The great discussion Tony O’Hagan had with Dennis Lindley last March for the Bayes 250 meeting at the RSS is now available on line.
Since this is still close to Dennis’s birthday, I take the opportunity to wish him the best for his 90th birthday.
Here is the reply by Chris and Steve about my comments from yesterday:
Thanks to Christian for the comments and feedback on our paper “A General Framework for Updating Belief Distributions“. We agree with Christian that starting with a summary statistic, or statistics, is an anchor for inference or learning, providing direction and guidance for models, avoiding the alternative vague notion of attempting to model a complete data set. The latter idea has dominated the Bayesian methodology for decades, but with the advent of large and complex data sets, this is becoming increasingly challenging, if not impossible.
However, in order to do work with statistics of interest, we need to find a framework in which this direct approach can be supported by a learning strategy when the formal use of Bayes theorem is not applicable. We achieve this in the paper for a general class of loss functions, which connect observations with a target of interest. A point raised by Christian is how arbitrary these loss functions are. We do not see this at all; for if a target has been properly identified then the most primitive construct between observations informing about a target and the target would come in the form of a loss function. One should always be able to assess the loss of ascertaining a value of as an action and providing the loss in the presence of observation x. The question to be discussed is whether loss functions are objective, as in the case of the median loss,
or subjective, in the case of the choice between loss functions for estimating a location of a distribution; mean, median or mode? But our work is situated in the former position.
Previous work on loss functions, mostly in the classical literature, has spent a lot of space working out what are optimal loss functions for targets of interest. We are not really dealing with novel targets and so we can draw on the classic literature here. The work can be thought of as the Bayesian version of the M-estimator and associated ideas. In this respect we are dealing with two loss functions for updating belief distributions, one for the data, which we have just discussed, and one for the prior information, which, due to coherence principles, must be the Kullback-Leibler divergence. This raises the thorny issue of how to calibrate the two loss functions. We discuss this in the paper.
To then deal with the statistic problem, mentioned at the start of this discussion, we have found a nice way to proceed by using the loss function . How this loss function, combined with the use of the exponential family, can be used to estimate functionals of the type
is provided in the Walker talk at Bayes 250 in London, titled “The Misspecified Bayesian”, since the “model” is designed to be misspecified, a tool to estimate and learn about I only. The basic idea is to evaluate I by ensuring that we learn about the for which
This is the story of the background, we would now like to pick up in more detail on three important points that you raise in your post:
- The arbitrariness in selecting the loss function.
- The relative weighting of loss-to-data vs. loss-to-prior.
- The selection of the loss in the M-free case.
In the absence of complete knowledge of the data generating mechanism, i.e. outside of M-closed,
- We believe the statistician should weigh up the relative arbitrariness in selecting a loss function targeting the statistic of interest versus the arbitrariness of selecting a misspecified model, known not to be true, for the complete data generating mechanism. There is a wealth of literature on how to select optimal loss functions that target specific statistics, e.g. Hüber (2009) provides a comprehensive overview of how this should be done. As far as we are aware, we know of no formal procedures (that do not rely on loss functions) to select a false sampling distribution for the whole of x; see Key, Pericchi and Smith (1999).
- The relative weighting of loss-to-data vs. loss-to-prior. This is an interesting open problem. Our framework shows in the absence of M-closed or the use of self-information loss that the analyst must select this weighting. In our paper we suggest some default procedures. We have nowhere claimed these were “correct”. You raise concerns regards parameterisation and we agree with you that care is needed, but many of these issues equally hold for existing “Objective” or “Default” Bayes procedures, such as unit-information priors.
- The selection of the loss in M-free. You say “….there is no optimal choice for the substitute to the loss function…”. We disagree. Our approach is to select an established loss function that directly targets the statistic of interest, and elicit prior beliefs directly on the unknown value of this statistic. There is no notion here of a a pseudo-likelihood or where this converges to.
Thank you again to Christian for his critical observations!
While I left Paris under a thunderstorm, the weather in London was warm and sunny, and I enjoyed a nice walk to the RSS. With a Betsey Trotwood pub on the way that obviously delighted the David Copperfield fan in me! The Bayes 250 meeting started with the videoed interview of Dennis Lindley by and thanks to Tony O’Hagan in his Devonshire home. I hope the video gets on-line soon as it is remarkable in rendering Dennis’ view on Bayesian statistics, being full of humour and unremitting in his defence of the Bayesian approach. (And as I missed a few points due to an imperfect sound system.) “Coherence is all” could best summarise this interview. And the sincere regret that Bayesianism has not taken over…
The talks started with Gareth Roberts explaining why MCMC was possible in infinite dimension despite the dimensionality curse. (Starting his talk with a Rev. Bayes meets Newton, Markov and Metropolis diaporama.) Then, after a lunch break where some participants eloped to Bayes’ tomb next door (!), Sylvia Richardson presented a broad vision of Bayesian biostatistics, answering in my opinion some of Dennis’ worries that Bayes had not taken off widely-enough (my rephrasing). Dennis Prangle also chose to give an overview of ABC, rejoining my perspective that it is more of a new kind of inference with Bayesian justifications than a mere computational tool, Michael Jordan talked about Kingman’s paintbox (in relation with Tamara Broderick’s talk I had enjoyed so much in Kyoto) before rushing back to Paris, Phil Dawid gave a somehow a-Bayesian talk about the frequentist (in)validation of predictors, in connection with his calibration talk in Padova a few months ago, Iain Murray explained his NADE modelling tool, mixing neural nets with mixtures, and YeeWhye Teh concluded the talks of the day with a presentation of his Gibbs sampler for jump processes that I found most interesting (I later realised this was a paper I had missed in Bayes 250 in Edinburgh by leaving early!). The day ended with a few posters, including one by Maria Lomelli Garcia and YeeWhye Teh on alpha-stable processes that provided a new auxiliary variable representation of clear appeal. (The day actually ended for good with a light and enjoyable dinner in this most improbable Renaissance Hotel that literally stands at the end of the tracks of St Pancras…)
The second day was just as rich: after [a run in Regent’s Park and] a welcome from the current RSS president (John Pullinger, who happens to live in Turnbridge Wells, of all places!), Michael Goldstein gave a spirited defence of Bayesian statistics as a projection device (putting expectation forward of probability as in deFinetti and Hartigan), Andrew Golightly discussed particle filter approximations based on discretised diffusions and fighting degeneracy via bridging, Nicky Best managed to give three talks in one (!) around Bayesian epidemiology, beginning with a Rev. Bayes meets Dr. Snow (who started spatial epidemiology with his famous cholera map). Then Christophe Andrieu presented what were new & exciting results for me, showing by Peskun and convex orderings that using more unbiased estimates of the likelihood function was theoretically as well as practically improving the performances of the associated Exact Approximation MCMC algorithm. This was followed by Ben Calderhead, who summarised his recently arXived paper with Mark Girolami and co-authors on using Bayesian analysis to evaluate the uncertainty associated with the numerical resolution of differential equations, connecting with the older paper by Persi Diaconis on the topic (paper I remember discussing with George Casella in an Ithaca café while we were waiting for his car to be fixed…). I wonder whether the approach could be used to handle the constant estimation paradox raised by Larry Wasserman (and discussed on the ‘Og as well)… Under the title of “the misspecified Bayesian”, Stephen Walker sketched an on-going work with Chris Holmes, work that resonated deeply with some of my current musings about the nature of Bayesian inference on intractable problems. Hence giving me new prospects on ABC validation and extension. More precisely, he showed us a way to handle problems where only some aspect of the model is of interest and where a pseudo-model that (asymptotically) manages this aspect can be found. The paper should soon be arXived and I will certainly discuss it more at length then! Simon Wilson did a “Rev. Bayes meets Dr. Linnaeus” introduction and talked about the estimation of the number of newly discoveries of (unknown) species, a problem that I find fascinating even though I find the current solutions of an essentially hypergeometric model somehow oversimplifying. Chris Yau introduced us to his current work on cancer analysis and to his way of managing the complexity of the mutation process by hierarchical models, and Peter Green ended the presentations with a survey or survol of his work on doing inference on decomposable graphs, with online exhibits.
The meeting concluded with Adrian Smith giving a personal reminiscence of the (poor) state of Bayesian statistics in the 60’s and 70’s, paying tribute to his advisor Dennis Lindley for keeping the faith against strong opposition and for ensuring the survival of the field onto the next generation. (And linking once again with John Kingman.) As hopefully shown by my summary, the field is definitely alive nowadays and has accomplished much by managing the computational hurdles. (As shown further by our Statistical Science incoming vignettes, there are many cases where Bayesian analysis looks like the only available answer.) However, the new challenges raised by Big Data may well jeopardise this revival of a 250 year old principle by moving to quick-and-dirty (and less principled) inference techniques. What really made this meeting so successful in my opinion is that a lot of the talks we heard in Errol Street over those two days were exposing progress being made towards handling the new challenges. Hence, there still is hope for Bayesian techniques in the coming century!