## Assessing models when all models are false

When I arrived home from Philadelphia, I got the news that John Geweke was giving a seminar at CREST in the early afternoon and thus biked there to attend his talk. The topic was about comparing asset return models, but what interested most was the notion of comparing models without a reference to a true model, a difficulty I have been juggling with for quite a while (at least since the goodness-of-fit paper with Judith Rousseau we never resubmitted!). And for which I still do not find a satisfactory (Bayesian) solution.

Because there is no true model, Durham and Geweke use the daily (possibly Bayesian) predictive

$p(y_t|Y^o_{t-1},X^o_{t-1})$

as their basis for model assessment and rely on a log scoring rule

$\sum_{s=1}^{t-1} \log p_s(y^o_s|Y^o_{s-1},X^o_{s-1})$

to compare models. (The ‘o’ in the superscript denotes the observed values.) As reported in the paper this is a proper (or honest) scoring rule. If n models are under competition, a weighted (model) predictive average

$\sum_{i=1}^n \,\omega_{s-1;i} p^i_s(y_s|Y^o_{s-1},X^o_{s-1})$

can be considered and the paper examines the impact of picking the optimal weight vector $(\omega_{t-1,1},\ldots,\omega_{t-1,n})$ against the log scoring rule, i.e.

$\arg\max_{\mathbf{\omega}_{t-1}} \sum_{s=1}^{t-1} \sum_{i=1}^n \,\omega_{t-1;i} \log p^i_s(y^o_s|Y^o_{s-1},X^o_{s-1})$

The weight vector at time t-1 is therefore optimising the backward sequence of predictions of the observed values till time t-1. The interesting empirical result from this study is that, even from a Bayesian perspective, the weights never degenerate, unless one of the models is correct (which is rarely the case!). Thus, even after very long series of observations, the weights of different models remain away from zero (while the Bayesian posterior probability of a single model goes to one). Even though I am not yet at the point of adopting this solution (in particular because it seems to be using the data twice, once through the posterior/predictive and once through the score), I find the approach quite intriguing and hope I can study it further. Maybe a comparison with a Bayesian non-parametric evaluation would make sense…

This site uses Akismet to reduce spam. Learn how your comment data is processed.