Following the Re-Reading of Spiegelhalter et al. (2002) by David at the RSS Annual Conference a few weeks ago, and my invited discussion there, I was asked to contribute a written discussion to Series B, a request obviously impossible to refuse!
The main issue with DIC is the question of its worth for Bayesian decision analysis (since I doubt there are many proponents of DIC outside the Bayesian community). The appeal of DIC is, I presume, to deliver a single summary per model under comparison and to allow therefore for a complete ranking of those models. I however object at the worth of simplicity for simplicity’s sake: models are complex (albeit less than reality) and their usages are complex as well. To consider that model A is to be preferred upon model B just because DIC(A)=1228 < DiC(B)=1237 is a mimicry of the complex mechanisms at play behind model choice, especially given the wealth of information provided by a Bayesian framework. (Non-Bayesian paradigms are more familiar with procedures based on a single estimator value.) And to abstain from accounting for the significance of the difference between DIC(A) and DIC(B) clearly makes matters worse.
This is not even discussing the stylised setting where one model is considered as “true” and where procedures are compared by their ability to recover the “truth”. David Spiegelhalter repeatedly mentioned during his talk that he was not interested in this. This stance brings another objection, though, namely that models can only be compared against their predictive abilities, which DIC seems unable to capture. Once again, what is needed is a multi-factor and all-encompassing criterion that evaluates the predictive models in terms of their recovery of some features of the phenomenon under study. Or of the process being conducted. (Even stooping down to a one-dimensional loss function that is supposed to summarise the purpose of the model comparison does not produce anything close to the DIC function.)
Obviously, considering that asymptotic consistency is of no importance whatsoever (as repeated by David in Newcastle) avoids some embarrassing questions, except the one about the true purpose of statistical models and procedures. How can they be compared if no model is true and if accumulating data from a given model is not meaningful? How can simulation be conducted in such a barren landscape? I find it the more difficult to accept this minimalist attitude that models are truly used as if they were or could be true, at several stages in the process. It also prevents the study of the criterion under model misspecification, which would clearly be of interest.
Another point, already exposed in our 2006 Bayesian Analysis paper with Gilles Celeux, Florence Forbes, and Mike Titterington, is that there is no unique driving principle for constructing DICs. In that paper, we examined eight different and natural versions of DIC for mixture models, resulting in highly diverging values for DIC and the effective dimension of the parameter, I believe that such a lack of focus is bound to reappear in any multimodal setting and fear that the answer about (eight) different focus on what matters in the model is too cursory and lacks direction for the hapless practitioner.
My final remark about DIC is that it shares very much the same perspective as Murray Aitkin’s integrated likelihood, Both Aitkin (1991, 2009) and Spiegelhalter et al. (2002) consider a posterior distribution on the likelihood function, taken as a function of the parameter but omitting the delicate fact that it also depends on the observable and hence does not exist a priori. We wrote a detailed review of Aitkin’s (2009) book, where most of the criticisms equally apply to DIC, and I will not repeat them here, except for pointing out that it escapes the Bayesian framework (and thus requires even more its own justifications).