## Can we have our Bayesian cake and eat it too?

Posted in Books, pictures, Statistics, University life with tags , , , , , , on January 17, 2018 by xi'an

This paper aims at solving the Bartlett-Lindley-Jeffreys paradox, i.e., the difficulty connected with improper priors in Bayes factors. The introduction is rather lengthy since by page 9 we are still (dis-)covering the Lindley paradox, along with the introduction of a special notation for -2 times the logarithm of the Bayes factor.

“We will now resolve Lindley’s paradox in both of the above examples.”

The “resolution” of the paradox stands in stating the well-known consistency of the Bayes factor, i.e., that as the sample grows to infinity it goes to infinity (almost surely) under the null hypothesis and to zero under the alternative (almost surely again, both statements being for fixed parameters.) Hence the discrepancy between a small p-value and a Bayes factor favouring the null occurs “with vanishingly small” probability. (The authors distinguish between Bartlett’s paradox associated with a prior variance going to infinity [or a prior becoming improper] and Lindley-Jeffreys’ paradox associated with a sample size going to infinity.)

“We construct cake priors using the following ingredients”

The “cake” priors are defined as pseudo-normal distributions, pseudo in the sense that they look like multivariate Normal densities, except for the covariance matrix that also depends on the parameter, as e.g. in the Fisher information matrix. This reminds me of a recent paper of Ronald Gallant in the Journal of Financial Econometrics that I discussed. With the same feature. Except for a scale factor inversely log-proportional to the dimension of the model. Now, what I find most surprising, besides the lack of parameterisation invariance, is that these priors are not normalised. They do no integrate to one. As to whether or not they integrate, the paper keeps silent about this. This is also a criticism I addressed to Gallant’s paper, getting no satisfactory answer. This is a fundamental shortcoming of the proposed cake priors…

“Hence, the relative rates that g⁰ and g¹ diverge must be considered”

The authors further argue (p.12) that by pushing the scale factors to infinity one produces the answer the Jeffreys prior would have produced. This is not correct since the way the scale factors diverge, relative to one another, drives the numerical value of the limit! Using inversely log-proportionality in the dimension(s) of the model(s) is a correct solution, from a mathematical perspective. But only from a mathematical perspective.

“…comparing the LRT and Bayesian tests…”

Since the log-Bayes factor is the log-likelihood ratio modulo the ν log(n) BIC correction, it is not very surprising that both approaches reach close answers when the scale goes to infinity and the sample size n as well. In the end, there seems to be no reason for going that path other than making likelihood ratio and Bayes factor asymptotically coincide, which does not sound like a useful goal to me. (And so does recovering BIC in the linear model.)

“No papers in the model selection literature, to our knowledge, chose different constants for each model under consideration.”

In conclusion, the paper sets up a principled or universal way to cho<a href=”https://academic.oup.com/jfec/article-abstract/14/2/265/1751312?redirectedFrom=fulltext”></a><a href=”https://xiaose “cake” priors fighting Lindley-Jeffreys’ paradox, but the choices made therein remain arbitrary. They allow for a particular limit to be found when the scale parameter(s) get to infinity, but the limit depends on the connection created between the models, which should not share parameters if one is to be chosen. (The discussion of using improper priors and arbitrary constants is aborted, resorting to custom arguments as the above.) The paper thus unfortunately does not resolve Lindley-Jeffreys’ paradox and the vexing issue of improper priors unfit for testing.