Natesh Pillai and Xiao-Li Meng just arXived a short paper that solves the Cauchy conjecture of Drton and Xiao [I mentioned last year at JSM], namely that, when considering two normal vectors with generic variance matrix S, a weighted average of the ratios X/Y remains Cauchy(0,1), just as in the iid S=I case. Even when the weights are random. The fascinating side of this now resolved (!) conjecture is that the correlation between the terms does not seem to matter. Pushing the correlation to one [assuming it is meaningful, which is a suspension of belief!, since there is no standard correlation for Cauchy variates] leads to a paradox: all terms are equal and yet… it works: we recover a single term, which again is Cauchy(0,1). All that remains thus to prove is that it stays Cauchy(0,1) between those two extremes, a weird kind of intermediary values theorem!
Actually, Natesh and XL further prove an inverse χ² theorem: the inverse of the normal vector, renormalised into a quadratic form is an inverse χ² no matter what its covariance matrix. The proof of this amazing theorem relies on a spherical representation of the bivariate Gaussian (also underlying the Box-Müller algorithm). The angles are then jointly distributed as
and from there follows the argument that conditional on the differences between the θ’s, all ratios are Cauchy distributed. Hence the conclusion!
A question that stems from reading this version of the paper is whether this property extends to other formats of non-independent Cauchy variates. Somewhat connected to my recent post about generating correlated variates from arbitrary distributions: using the inverse cdf transform of a Gaussian copula shows this is possibly the case: the following code is meaningless in that the empirical correlation has no connection with a “true” correlation, but nonetheless the experiment seems of interest…
> ro=.999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > cor(x[,1]/x[,2],y[,1]/y[,2])  -0.1351967 > ro=.99999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > cor(x[,1]/x[,2],y[,1]/y[,2])  0.8622714 > ro=1-1e-5;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y))) > cor(x=z,y=w)  0.9999732 > ks.test((z+w)/2,"pcauchy") One-sample Kolmogorov-Smirnov test data: (z + w)/2 D = 0.0068, p-value = 0.3203 alternative hypothesis: two-sided > ro=1-1e-3;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y))) > cor(x=z,y=w)  0.9920858 > ks.test((z+w)/2,"pcauchy") One-sample Kolmogorov-Smirnov test data: (z + w)/2 D = 0.0036, p-value = 0.9574 alternative hypothesis: two-sided
There are several goals in the paper, the last one being the most important one.
The first one is to insist that considering θ as a parameter is not appropriate. We are in complete agreement on that point, but I prefer considering l(θ) as the parameter rather than N, mainly because it is much simpler. Knowing N, the law of l(θ) is given by the law of a random walk with 0 as reflexive boundary (Jaynes in his book, explores this link). So for a given prior on N, we can derive a prior on l(θ). Since the random process that generate N is completely unknown, except that N is probably large, the true law of l(θ) is completely unknown, so we may consider l(θ).
The second one is to state explicitly that a flat prior on θ implies an exponentially increasing prior on l(θ). As an anecdote, Stone, in 1972, warned against this kind of prior for Gaussian models. Another interesting anecdote is that he cited the novel by Abbot “Flatland : a romance of many dimension” who described a world where the dimension is changed. This is exactly the case in the FP since θ has to be seen in two dimensions rather than in one dimension.
The third one is to make a distinction between randomness of the parameter and prior distribution, each one having its own rule. This point is extensively discussed in Section 2.3.
– In the intuitive reasoning, the probability of no annihilation involves the true joint distribution on (θ, x) and therefore the true unknown distribution of θ,.
– In the Bayesian reasoning, the posterior probability of no annihilation is derived from the prior distribution which is improper. The underlying idea is that a prior distribution does not obey probability rules but belongs to a projective space of measure. This is especially true if the prior does not represent an accurate knowledge. In that case, there is no discontinuity between proper and improper priors and therefore the impropriety of the distribution is not a key point. In that context, the joint and marginal distributions are irrelevant, not because the prior is improper, but because it is a prior and not a true law. If the prior were the true probability law of θ,, then the flat distribution could not be considered as a limit of probability distributions.
For most applications, the distinction between prior and probability law is not necessary and even pedantic, but it may appear essential in some situations. For example, in the Jeffreys-Lindley paradox, we may note that the construction of the prior is not compatible with the projective space structure.