Thanks for this detailed analysis. I agree with you that there is no paradoxical behaviour in this divergence between some frequentist and some Bayesian answers. I doubt however that using hyperpriors would see the difficulty of using improper priors go away…

]]>If one were to take a step back, and see the point null as the limit of a proper distribution (e.g. a Normal with known mean and vanishing standard deviation) and the alternative as a Normal with unknown mean and vanishing precision then the set-up (prior to taking limits) is that of comparing two models with Normal priors for the mean parameter (mu) of a normal model y~N(mu,sigma):

N(0,sigmaH0) vs N(0,sigmaH1)

In this set-up, the Bayes Factor will be determined by the relative values of sigmaH0 vs sigmaH1 for fixed N. This will be true even in the limiting case, in which sigmaH0->0 (so that the Normal prior of the null tends to a Dirac function) and sigmaH1-> Infinity yielding the Lindley paradox. So if the only thing that matters for the value of the Bayes factor is the relative value of the sigma’s, and we really really want to test point null hypotheses (so that sigmaH0 = 0), then why don’t we bite the bullet and elevate the level of abstraction somewhat: Instead of trying to figure out a limiting process that allows us to take sigmaH1 all the way to infinity, without yielding a paradox we use a hierarchical model set-up in which H0 is still point sharp, but H1 is defined by:

mu~N(0,sigmaH1); sigmaH1~p(some hyperparameters) or even

mu~N(theta,sigmaH1); sigmaH1~p(some hyperparameters) and theta~p(other hyperparameters)

This would allow us (or so I think) to paradox the diffuse priors that seem to facilitate the occurence of the prior in the first place and even re-instate some “parametric” symmetry in the estimation:

if H0 is really a (limit of a) Normal prior with known mean and sd, then the alternative is actually a Normal prior in which the mean and the sd are unknown and are estimated from the data.

I’m not so sure whether it is irrelevant what can happen in whatever-can-happen space, because there may be a subset of nonzero measure that can be approximated at least fairly well by the point null hypothesis. (Although such subset normally exists in the full model minus the point null, too, which is not taken into account if the point null gets positive prior probability.)

I do realise that in your text above you where neither the strongest proponent nor the strongest opponent of nonzero probability for the point null; I just had this thought which I liked after reading it.)

]]>This sounds like an impossibility no-action theorem: any parametric family has zero measure in this space of whatever-can-happen. Now, if the point null hypothesis is to be interpreted as a model that could be chosen and if chosen be used as “the” model, what happens in this space of whatever-can-happen does not matter, even if the “true” model is not chosen. This is an operational rather than epistemological choice: the test picks the most operational model within a collection, even with the understanding this is a measure zero set in the space of whatever-can-happen.

]]>