my DICussion
Following the Re-Reading of Spiegelhalter et al. (2002) by David at the RSS Annual Conference a few weeks ago, and my invited discussion there, I was asked to contribute a written discussion to Series B, a request obviously impossible to refuse!
The main issue with DIC is the question of its worth for Bayesian decision analysis (since I doubt there are many proponents of DIC outside the Bayesian community). The appeal of DIC is, I presume, to deliver a single summary per model under comparison and to allow therefore for a complete ranking of those models. I however object at the worth of simplicity for simplicity’s sake: models are complex (albeit less than reality) and their usages are complex as well. To consider that model A is to be preferred upon model B just because DIC(A)=1228 < DiC(B)=1237 is a mimicry of the complex mechanisms at play behind model choice, especially given the wealth of information provided by a Bayesian framework. (Non-Bayesian paradigms are more familiar with procedures based on a single estimator value.) And to abstain from accounting for the significance of the difference between DIC(A) and DIC(B) clearly makes matters worse.
This is not even discussing the stylised setting where one model is considered as “true” and where procedures are compared by their ability to recover the “truth”. David Spiegelhalter repeatedly mentioned during his talk that he was not interested in this. This stance brings another objection, though, namely that models can only be compared against their predictive abilities, which DIC seems unable to capture. Once again, what is needed is a multi-factor and all-encompassing criterion that evaluates the predictive models in terms of their recovery of some features of the phenomenon under study. Or of the process being conducted. (Even stooping down to a one-dimensional loss function that is supposed to summarise the purpose of the model comparison does not produce anything close to the DIC function.)
Obviously, considering that asymptotic consistency is of no importance whatsoever (as repeated by David in Newcastle) avoids some embarrassing questions, except the one about the true purpose of statistical models and procedures. How can they be compared if no model is true and if accumulating data from a given model is not meaningful? How can simulation be conducted in such a barren landscape? I find it the more difficult to accept this minimalist attitude that models are truly used as if they were or could be true, at several stages in the process. It also prevents the study of the criterion under model misspecification, which would clearly be of interest.
Another point, already exposed in our 2006 Bayesian Analysis paper with Gilles Celeux, Florence Forbes, and Mike Titterington, is that there is no unique driving principle for constructing DICs. In that paper, we examined eight different and natural versions of DIC for mixture models, resulting in highly diverging values for DIC and the effective dimension of the parameter, I believe that such a lack of focus is bound to reappear in any multimodal setting and fear that the answer about (eight) different focus on what matters in the model is too cursory and lacks direction for the hapless practitioner.
My final remark about DIC is that it shares very much the same perspective as Murray Aitkin’s integrated likelihood, Both Aitkin (1991, 2009) and Spiegelhalter et al. (2002) consider a posterior distribution on the likelihood function, taken as a function of the parameter but omitting the delicate fact that it also depends on the observable and hence does not exist a priori. We wrote a detailed review of Aitkin’s (2009) book, where most of the criticisms equally apply to DIC, and I will not repeat them here, except for pointing out that it escapes the Bayesian framework (and thus requires even more its own justifications).
September 26, 2013 at 7:03 pm
For example it [the model] can used for estimating a parameter of interest, such as N in the binomial or hypergeometric distribution. Prediction is extremely important in statistics, however it is not its sole purpose.
September 26, 2013 at 8:58 pm
Estimating N in the binomial or the negbinomial requires some belief in the model, no?
September 26, 2013 at 10:42 am
Isn’t the claim “If one (like David) does not believe in models, the only perspective that can compare models is in terms of their predictive abilities” a little bit (to much) strong? It could be based on how well the models fit the data (without claiming the model is the true model), for instance.
September 26, 2013 at 12:17 pm
Yes, it is obviously strong and intended as such. However, if the model per se does not matter, ie if there is no theory to (in)validate or to falsify à la Popper, and if the (then wrong) model is not used for prediction purposes, what is the point of fitting the model to the data?
September 26, 2013 at 4:39 pm
Oops, I didn’t really explain what we do with our models in psychology and linguistics.
We are testing theories a la Popper, and usually what we do is pit competing theories against each other. One theory might predict that the parameter value is positive, and the other negative. Sometimes we also want to know which factors affect the dependent variable, and even in these cases it’s always about the theory.
You’re talking about fitting models in a theory-free manner. At least I have never done that (except maybe while studying statistics ;), and I don’t know anyone who does that.
I first learnt about how to use DICs for bayesian data analysis from Gelman and Hill, so I’m kind of puzzled by your comments.
September 25, 2013 at 7:23 am
What about models that are not intended for prediction, e.g., planned experiments? Isn’t DIC useful for such settings?
September 25, 2013 at 9:18 pm
If one (like David) does not believe in models, the only perspective that can compare models is in terms of their predictive abilities, ie their capacity to reproduce the observed phenomenon.