I am sure a lot of people have a positive impression about this paper, it was actually sent to me by my friend and colleague Judith who clearly appreciated it.

]]>> no statistical novelty, apart from looking at the distribution of posterior probabilities in toy examples

There are a number of quite general asymptotic results buried in the supplementary materials. So I think there is some statistical novelty here.

> It is also bizarre that the argument does not account for the complexity of each model and the resulting (Occam’s razor) penalty

Their general theory in the supplementary material does cover the case of models of different dimensionality and there is in fact an Occam’s razor penalty. It’s summarized in Table S.3.

> The notion that two models are equally wrong because they are both exactly at the same Kullback-Leibler distance from the generating process (when optimised over the parameter) is such a formal [or cartoonesque] notion that it does not make much sense.

But that’s exactly what determines which models could be selected asymptotically. It’s hardly an arbitrary choice they are making!

> There is always one model that is slightly closer and eventually takes over.

The authors seem well aware of this reality, which is why they back up the asymptotic theory with simulation experiments to demonstrate the same behavior occurs non-asymptotically when multiple models have similar KL divergence (see Fig 3).

Stepping back: in his Bayesian Analysis paper on misspecification, Peter Grunwald talks about “benign” vs “bad” misspecification for Bayesian inference in the predictive setting. Yang and Zhu are investigating a similar phenomenon in the model selection setting. In both the Grunwald and Yang/Zhu settings, bad misspecification leads to overconfident posteriors/posterior predictives. Understanding when this can happen and how to fix it seem like worthwhile endeavors to me!

]]>However, it is not really designed to decide what models should be included in a reasonable set of wrong but potentially useful models (Gelman and many others appear to concede that ground to empiricalism or frequentism) and hence possibly facilitate a sensitivity analysis over such a set of models.

]]>