Lindley’s paradox(es) and scores

“In the asymptotic limit, the Bayesian cannot justify the strictly positive probability of H₀ as an approximation to testing the hypothesis that the parameter value is close to θ₀ — which is the hypothesis of real scientific interest”

While revising my Jeffreys-Lindley’s paradox paper for Philosophy of Science, it was suggested (to me) that I read the incoming paper by Jan Sprenger on this paradox. The paper is entitled Testing a Precise Null Hypothesis: The Case of Lindley’s Paradox and it defends the thesis that the regular Bayesian approach (hence the Bayes factor used in the Jeffreys-Lindley’s paradox) is forced to put a prior on the (point) null hypothesis when all that really matters is the vicinity of the null. (I think Andrew would agree there as he positively hates point null hypotheses. See also Rissanen’s perspective about maximal precision allowed by a give sample.) Sprenger then advocates the use of the log score for comparing the full model with the point-null sub-model, i.e. the posterior expectation of the Kullback-Leibler distance between both models:

$\mathbb{E}^\pi\left[\mathbb{E}_\theta\{\log f(X|\theta)/ f(X|\theta_0)\}|x\right],$

rejoining José Bernardo and Phil Dawid on this ground.

While I agree about the notion that it is impossible to distinguish a small enough departure from the null from the null (no typo!), and I also support the argument that “all models are wrong”, hence point null should eventually—meaning with enough data—rejected, I find the Bayesian solution through the Bayes factor rather appealing because it uses the prior distribution to weight the alternative values of θ in order to oppose their averaged likelihood to the likelihood in θ₀. (Note I did not mention Occam!) Further, while the notion of opposing a point null to the rest of the Universe may sound silly, what truly matters is the decisional setting, namely that we want to select a simpler model and use it for later purposes. It is therefore this issue that should be tested, rather than whether or not θ is exactly equal to θ₀. I incidentally find it amusing that Sprenger picks the ESP experiment as his illustration in that this is a (the?) clearcut case where the point null: “there is no such thing as ESP” makes sense. Now, it can be argued that what the statistical experiment is assessing is the ESP experiment, for which many objective causes (beyond ESP!) may induce a departure from the null (and from the binomial model). But then this prevents any rational analysis of the test (as is indeed the case!).

The paper thus objects to the use of Bayes factors (and of p-values) to instead propose to compare scores in the Bernardo-Dawid spirit. As discussed earlier, it has several appealing features, from recovering the Kullback-Leibler divergence between models as a measure of fit to allowing for the incorporation of improper priors (a point Andrew may disagree with), to avoiding the double use of the data. It is however incomplete in that it creates a discrepancy or a disbalance between both models, thus making the comparison of more than two models difficult to fathom, and it does not readily incorporate the notion of nuisance parameters in the embedded model, seemingly forcing the inclusion of pseudo-priors as in the Bayesian analysis of Aitkin’s integrated likelihood.

This entry was posted on September 13, 2013 at 12:13 am and is filed under Books, Statistics, University life with tags Hyvärinen score, Jeffreys-Lindley paradox, Kullback-Leibler divergence, log scores, Philosophy of Science, Rissanen. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

5 Responses to “Lindley’s paradox(es) and scores”

Christos Argyropoulos Says:
September 15, 2013 at 2:38 pm

I have to say that I do not really see a paradox here; the frequentist answer just says that the data are incompatible with the null, while the Bayesian answer states that the null is more probable than the alternative. If we want to reconcile these approaches, then the Bayesian one should come up with a more plausible alternative explanation (“higher posterior model marginal”) than the null. The paradox only shows that this is not mathematically possible with the Normal prior of the alternative hypothesis, especially if one tries to pull a “diffuse/non-informative” stint using a limiting argument.

If one were to take a step back, and see the point null as the limit of a proper distribution (e.g. a Normal with known mean and vanishing standard deviation) and the alternative as a Normal with unknown mean and vanishing precision then the set-up (prior to taking limits) is that of comparing two models with Normal priors for the mean parameter (mu) of a normal model y~N(mu,sigma):

N(0,sigmaH0) vs N(0,sigmaH1)

In this set-up, the Bayes Factor will be determined by the relative values of sigmaH0 vs sigmaH1 for fixed N. This will be true even in the limiting case, in which sigmaH0->0 (so that the Normal prior of the null tends to a Dirac function) and sigmaH1-> Infinity yielding the Lindley paradox. So if the only thing that matters for the value of the Bayes factor is the relative value of the sigma’s, and we really really want to test point null hypotheses (so that sigmaH0 = 0), then why don’t we bite the bullet and elevate the level of abstraction somewhat: Instead of trying to figure out a limiting process that allows us to take sigmaH1 all the way to infinity, without yielding a paradox we use a hierarchical model set-up in which H0 is still point sharp, but H1 is defined by:

mu~N(0,sigmaH1); sigmaH1~p(some hyperparameters) or even
mu~N(theta,sigmaH1); sigmaH1~p(some hyperparameters) and theta~p(other hyperparameters)

This would allow us (or so I think) to paradox the diffuse priors that seem to facilitate the occurence of the prior in the first place and even re-instate some “parametric” symmetry in the estimation:
if H0 is really a (limit of a) Normal prior with known mean and sd, then the alternative is actually a Normal prior in which the mean and the sd are unknown and are estimated from the data.

Reply
- xi'an Says:
  September 15, 2013 at 4:59 pm
  
  Thanks for this detailed analysis. I agree with you that there is no paradoxical behaviour in this divergence between some frequentist and some Bayesian answers. I doubt however that using hyperpriors would see the difficulty of using improper priors go away…
  
  Reply
Christian Hennig Says:
September 13, 2013 at 7:19 pm

If “all models are wrong” is a reason for not giving positive prior probability to a point null hypothesis, how can one defend giving a prior probability of one to the full model, which still should have measure zero in the space of whatever can happen?

Reply
- xi'an Says:
  September 13, 2013 at 7:47 pm
  
  This sounds like an impossibility no-action theorem: any parametric family has zero measure in this space of whatever-can-happen. Now, if the point null hypothesis is to be interpreted as a model that could be chosen and if chosen be used as “the” model, what happens in this space of whatever-can-happen does not matter, even if the “true” model is not chosen. This is an operational rather than epistemological choice: the test picks the most operational model within a collection, even with the understanding this is a measure zero set in the space of whatever-can-happen.
  
  Reply
  - Christian Hennig Says:
    September 13, 2013 at 10:33 pm
    
    I’m not so sure whether it is irrelevant what can happen in whatever-can-happen space, because there may be a subset of nonzero measure that can be approximated at least fairly well by the point null hypothesis. (Although such subset normally exists in the full model minus the point null, too, which is not taken into account if the point null gets positive prior probability.)
    
    I do realise that in your text above you where neither the strongest proponent nor the strongest opponent of nonzero probability for the point null; I just had this thought which I liked after reading it.)

Xi'an's Og