Deviance and posterior likelihood

An interesting paper by Murray Aitkin, Charles Liu, and Tom Chadwick is due to appear in the Annals of Applied Statistics. Its main focus is the comparison of models for a specific problem of small area estimation, but it also contains general views on Bayesian model choice that relates to our earlier paper about DIC and to our reassessment of Bayes factor approximation with Jean-Michel Marin. In connection with Aitkin’s 1991 paper, this paper considers the posterior distribution of the likelihood. For one given model, this amounts to using the likelihood twice since the posterior expectation of the likelihood equals

\int_{\Theta} \pi(\theta|x) L(\theta|x) \text{d}\theta = \int_{\Theta} \pi(\theta) L^2(\theta|x) \text{d}\theta / m(x)

This made me first think that DIC may suffer from the same drawback since

\text{DIC} = -4 \mathbb{E}[\log L(\theta|x) | x] + 2 \log L(\hat\theta(x)|x)

hence integrates a quantity that depends on x against the posterior, but this is not the same since

\mathbb{E}[\log L(\theta|x) | x] = \int_{\Theta} \log(\pi(\theta|x)/\pi(\theta)) \pi(\theta|x)\text{d}\theta - \log(m(x))

appears like a Kullback-Leibler divergence between the prior and the posterior distributions. Back to the paper, it suggest to run model comparison via the distribution of the likelihood ratio values


where the \theta_i‘s and \theta_k‘s are draws from the respective posteriors. This seems very close to Steve Scott’s (JASA, 2002) and to Peter Congdon’s (CSDA, 2006) solutions analysed in our paper, in that MC(MC) runs are ran for each model separately and the samples are gathered together to produce either the posterior expectation (in Scott’s case) or the posterior distribution (for the current paper) of

\rho_i L(\theta_i|x) \bigg/ \sum_k \rho_i L(\theta_i|x)

which do not correspond to genuine Bayesian solutions, in my opinion, because the data x is used repeatedly in this process, producing a sample from the product of the posteriors, instead of using pseudo-priors as in Carlin and Chib (Series B, 1995). In addition, I fail to see how to interpret the “posterior” distribution of this quantity, even under the condition of using pseudo-priors, as it will also reflect the joint distribution, while a coherent model choice procedure should only depend on the marginal (model-wise) distributions (which reminds me of an old criticism against Pitman nearness…)

6 Responses to “Deviance and posterior likelihood”

  1. […] factors approximations from separate chains running each on a separate model have led to erroneous solutions. It appears however that the paper builds upon a technique fully exposed in the book written by the […]

  2. […] already commented in a post last July, the positive aspect of looking at this quantity rather than at the Bayes factor is that […]

  3. […] a prior centered at the mle and with a zero prior variance. (See also the controversy surrounding Murray Aitkin’s resolution of the improper prior difficulty by using the data twice.) Possibly related […]

  4. I have add a quick glance at Nicolae et al. (2008), which is available on Xiao-Li’s webpage. The authors indeed consider as well the posterior distributions of the likelihood ratio and of the log-likelihood ratio. I will hopefully have more time today to read more deeply into this paper. After my talk this morning (H-hour -6!).

  5. The Bayes factor, for nested models with consistent priors, is equal to the posterior expectation (with respect to the posterior distribution of the alternative model parameters) of the likelihood ratio between the models (see Nicolae, D.L., Meng, X.-L. and Kong, A. (2008) Quantifying the fraction
    of missing information for hypothesis testing in statistical and genetic studies
    (with discussion). Statistical Science 23, 287-331.)

    Objectors to the use of the posterior distribution of the likelihood ratio between nested models need to explain how this is different from the use of the mean of this distribution in the Bayes factor.

    • Thank you for making this point, Murray. I see a difference between the posterior distribution of the likelihood ratio and the use of the likelihood in the Bayes factor in that the later is the expectation of the likelihood under the prior, not under the posterior. This is in this sense that I find the posterior likelihood ratio using the likelihood twice.. Obviously, there is no reason for this solution not to be consistent, but it is not Bayesian stricto sensu.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.