the maths of Jeffreys-Lindley paradox

75b18-imagedusparadrapCristiano Villa and Stephen Walker arXived on last Friday a paper entitled On the mathematics of the Jeffreys-Lindley paradox. Following the philosophical papers of last year, by Ari Spanos, Jan Sprenger, Guillaume Rochefort-Maranda, and myself, this provides a more statistical view on the paradox. Or “paradox”… Even though I strongly disagree with the conclusion, namely that a finite (prior) variance σ² should be used in the Gaussian prior. And fall back on classical Type I and Type II errors. So, in that sense, the authors avoid the Jeffreys-Lindley paradox altogether!

The argument against considering a limiting value for the posterior probability is that it converges to 0, 21, or an intermediate value. In the first two cases it is useless. In the medium case. achieved when the prior probability of the null and alternative hypotheses depend on variance σ². While I do not want to argue in favour of my 1993 solution

\rho(\sigma) = 1\big/ 1+\sqrt{2\pi}\sigma

since it is ill-defined in measure theoretic terms, I do not buy the coherence argument that, since this prior probability converges to zero when σ² goes to infinity, the posterior probability should also go to zero. In the limit, probabilistic reasoning fails since the prior under the alternative is a measure not a probability distribution… We should thus abstain from over-interpreting improper priors. (A sin sometimes committed by Jeffreys himself in his book!)

10 Responses to “the maths of Jeffreys-Lindley paradox”

  1. Dan Simpson Says:

    I just scanned up to example 1, but isn’t this just an ordinary bayesian “paradox” (ie a thing that’s obviously user error rather than something surprising).

    IF it’s what I think it is, then That example is overly contrived if it is: all you need is a likelihood f(theta) = O(exp(x^{2+epsilon})) and put a normal prior on theta. It has nothing to do with continuous models, repeated sampling or measure zero events. It’s just checking your tails. (The bayesian equivalent of checking that the “maximum” of your likelihood isn’t a minimum)

    Or am I missing something.

  2. Proper priors prevent p____-poor performance.

    • There are also examples where proper priors produce “poor” performance, namely, improper posteriors (e.g. for point observations).

      • Proper priors cannot produce improper posteriors!

      • They actually can produce improper posteriors if you use continuous models with samples that contain repeated observations (I know the argument “that’s a zero-probability event”, but any sample has probability zero under a continuous model and we still use them). If I properly recall, you wrote a comment on a paper that presents such examples:

        http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.377.5786&rep=rep1&type=pdf

        The idea behind this issue is that continuous models are approximations, but, as with any approximations, there are conditions behind them.

      • Ah! This is an old debate I have had with Mark Steel about one of his Valencia meeting papers. The measure zero argument does not apply in this case: if you observe x=3.141396… say, we do have P(X=3.1413962…)=0 but a priori you cannot exclude this specific value as a realisation of X. While a prior I can exclude that X will take the value x=0 or the value x=π… Hence, you can exclude from the beginning/a priori, i.e., before looking at the data that X1=X2 is impossible. If repeated values occur in your sample, your probability model should account for this possibility by having point masses along the diagonal. Otherwise, your model is inadequate.

      • I agree with your points but, as discussed in the paper, the reason for the impropriety of the posterior is that the likelihood function is unbounded (even with proper priors). There are many examples, without involving the presence of repeated observations, where the likelihood is unbounded and one has to be careful in these cases as well, I think. William Meeker recently published an interesting paper in The American Statistician with examples of continuous models with unbounded likelihoods.

      • ThanX, Javier, I will take a look at this paper. However, unbounded likelihoods cannot cause impropriety of the posterior if the prior is proper. To be continued…

      • Not always, I agree, only sometimes :). Thanks for the discussion.

      • There is more to it than just saying that the likelihood is unbounded. It is true that if the likelihood is in L^\infty, then any absolutely continuous proper prior will lead to a proper posterior. Then things get a bit more subtle.

        If the likelihood is L^1 (i.e. not essentially bounded but integrable), then any bounded absolutely continuous prior is ok.

        In between these, you can have unbounded priors and unbounded likelihoods as long as they’re unboundedness is complementary (i.e. L^p likelihood and L^q prior, where 1/p+1/q=1).

        If you want to have atomic priors, then you need your likelihood to be very very nice. (i.e. have, for a d-dimensional state, more than d L^1 derivatives).

        The tl;dr version: the more weird your likelihood, the nicer the prior needs to be. (This only relates to boundedness in very limited ways)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s