Pierre Druilhet arXived a note a few days ago about the Flatland paradox (due to Stone, 1976) and his arguments against the flat prior. The paradox in this highly artificial setting is as follows:  Consider a sequence θ of N independent draws from {a,b,1/a,1/b} such that

1. N and θ are unknown;
2. a draw followed by its inverse and this inverse are removed from θ;
3. the successor x of θ is observed, meaning an extra draw is made and the above rule applied.

Then the frequentist probability that x is longer than θ given θ is at least 3/4—at least because θ could be zero—while the posterior probability that x is longer than θ given x is 1/4 under the flat prior over θ. Paradox that 3/4 and 1/4 clash. Not so much of a paradox because there is no joint probability distribution over (x,θ).

The paradox was actually discussed at length in Larry Wasserman’s now defunct Normal Variate. From which I borrowed Larry’s graphical representation of the four possible values of θ given the (green) endpoint of x. Larry uses the Flatland paradox hammer to fix another nail on the coffin he contemplates for improper priors. And all things Bayes. Pierre (like others before him) argues against the flat prior on θ and shows that a flat prior on the length of θ leads to recover 3/4 as the posterior probability that x is longer than θ.

As I was reading the paper in the métro yesterday morning, I became less and less satisfied with the whole analysis of the problem in that I could not perceive θ as a parameter of the model. While this may sound a pedantic distinction, θ is a latent variable (or a random effect) associated with x in a model where the only unknown parameter is N, the total number of draws used to produce θ and x. The distributions of both θ and x are entirely determined by N. (In that sense, the flatland paradox can be seen as a marginalisation paradox in that an improper prior on N cannot be interpreted as projecting a prior on θ.) Given N, the distribution of x of length l(x) is then 1/4N times the number of ways of picking (N-l(x)) annihilation steps among N. Using a prior on N like 1/N , which is improper, then leads to favour the shortest path as well. (After discussing the issue with Pierre Druilhet, I realised he had a similar perspective on the issue. Except that he puts a flat prior on the length l(x).) Looking a wee bit further for references, I also found that Bruce Hill had adopted the same perspective of a prior on N.