## Mea Culpa

Posted in Statistics with tags , , , , , , , , , , , on April 10, 2020 by xi'an

[A quote from Jaynes about improper priors that I had missed in his book, Probability Theory.]

For many years, the present writer was caught in this error just as badly as anybody else, because Bayesian calculations with improper priors continued to give just the reasonable and clearly correct results that common sense demanded. So warnings about improper priors went unheeded; just that psychological phenomenon. Finally, it was the marginalization paradox that forced recognition that we had only been lucky in our choice of problems. If we wish to consider an improper prior, the only correct way of doing it is to approach it as a well-defined limit of a sequence of proper priors. If the correct limiting procedure should yield an improper posterior pdf for some parameter α, then probability theory is telling us that the prior information and data are too meager to permit any inferences about α. Then the only remedy is to seek more data or more prior information; probability theory does not guarantee in advance that it will lead us to a useful answer to every conceivable question.Generally, the posterior pdf is better behaved than the prior because of the extra information in the likelihood function, and the correct limiting procedure yields a useful posterior pdf that is analytically simpler than any from a proper prior. The most universally useful results of Bayesian analysis obtained in the past are of this type, because they tended to be rather simple problems, in which the data were indeed so much more informative than the prior information that an improper prior gave a reasonable approximation – good enough for all practical purposes – to the strictly correct results (the two results agreed typically to six or more significant figures).

In the future, however, we cannot expect this to continue because the field is turning to more complex problems in which the prior information is essential and the solution is found by computer. In these cases it would be quite wrong to think of passing to an improper prior. That would lead usually to computer crashes; and, even if a crash is avoided, the conclusions would still be, almost always, quantitatively wrong. But, since likelihood functions are bounded, the analytical solution with proper priors is always guaranteed to converge properly to finite results; therefore it is always possible to write a computer program in such a way (avoid underflow, etc.) that it cannot crash when given proper priors. So, even if the criticisms of improper priors on grounds of marginalization were unjustified,it remains true that in the future we shall be concerned necessarily with proper priors.

## statistics with improper posteriors [or not]

Posted in Statistics with tags , , , , , , on March 6, 2019 by xi'an

Last December, Gunnar Taraldsen, Jarle Tufto, and Bo H. Lindqvist arXived a paper on using priors that lead to improper posteriors and [trying to] getting away with it! The central concept in their approach is Rényi’s generalisation of Kolmogorov’s version to define conditional probability distributions from infinite mass measures by conditioning on finite mass measurable sets. A position adopted by Dennis Lindley in his 1964 book .And already discussed in a few ‘Og’s posts. While the theory thus developed indeed allows for the manipulation of improper posteriors, I have difficulties with the inferential aspects of the construct, since one cannot condition on an arbitrary finite measurable set without prior information. Things get a wee bit more outwardly when considering “data” with infinite mass, in Section 4.2, since they cannot be properly normalised (although I find the example of the degenerate multivariate Gaussian distribution puzzling as it is not a matter of improperness, since the degenerate Gaussian has a well-defined density against the right dominating measure).  The paper also discusses marginalisation paradoxes, by acknowledging that marginalisation is no longer feasible with improper quantities. And the Jeffreys-Lindley paradox, with a resolution that uses the sum of the Dirac mass at the null, δ⁰, and of the Lebesgue measure on the real line, λ, as the dominating measure. This indeed solves the issue of the arbitrary constant in the Bayes factor, since it is “the same” on the null hypothesis and elsewhere, but I do not buy the argument, as I see no reason to favour δ⁰+λ over 3.141516 δ⁰+λ or δ⁰+1.61718 λ… (This section 4.5 also illustrates that the choice of the sequence of conditioning sets has an impact on the limiting measure, in the Rényi sense.) In conclusion, after reading the paper, I remain uncertain as to how to exploit this generalisation from an inferential (Bayesian?) viewpoint, since improper posteriors do not clearly lead to well-defined inferential procedures…

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on February 8, 2019 by xi'an

As a reading suggestion for my (last) OxWaSP Bayesian course at Oxford, I included the classic 1973 Marginalisation paradoxes by Phil Dawid, Mervyn Stone [whom I met when visiting UCL in 1992 since he was sharing an office with my friend Costas Goutis], and Jim Zidek. Paper that also appears in my (recent) slides as an exercise. And has been discussed many times on this  ‘Og.

Reading the paper in the train to Oxford was quite pleasant, with a few discoveries like an interesting pike at Fraser’s structural (crypto-fiducial?!) distributions that “do not need Bayesian improper priors to fall into the same paradoxes”. And a most fascinating if surprising inclusion of the Box-Müller random generator in an argument, something of a precursor to perfect sampling (?). And a clear declaration that (right-Haar) invariant priors are at the source of the resolution of the paradox. With a much less clear notion of “un-Bayesian priors” as those leading to a paradox. Especially when the authors exhibit a red herring where the paradox cannot disappear, no matter what the prior is. Rich discussion (with none of the current 400 word length constraint), including the suggestion of neutral points, namely those that do identify a posterior, whatever that means. Funny conclusion, as well:

“In Stone and Dawid’s Biometrika paper, B1 promised never to use improper priors again. That resolution was short-lived and let us hope that these two blinkered Bayesians will find a way out of their present confusion and make another comeback.” D.J. Bartholomew (LSE)

and another

“An eminent Oxford statistician with decidedly mathematical inclinations once remarked to me that he was in favour of Bayesian theory because it made statisticians learn about Haar measure.” A.D. McLaren (Glasgow)

and yet another

“The fundamentals of statistical inference lie beneath a sea of mathematics and scientific opinion that is polluted with red herrings, not all spawned by Bayesians of course.” G.N. Wilkinson (Rothamsted Station)

Lindley’s discussion is more serious if not unkind. Dennis Lindley essentially follows the lead of the authors to conclude that “improper priors must go”. To the point of retracting what was written in his book! Although concluding about the consequences for standard statistics, since they allow for admissible procedures that are associated with improper priors. If the later must go, the former must go as well!!! (A bit of sophistry involved in this argument…) Efron’s point is more constructive in this regard since he recalls the dangers of using proper priors with huge variance. And the little hope one can hold about having a prior that is uninformative in every dimension. (A point much more blatantly expressed by Dickey mocking “magic unique prior distributions”.) And Dempster points out even more clearly that the fundamental difficulty with these paradoxes is that the prior marginal does not exist. Don Fraser may be the most brutal discussant of all, stating that the paradoxes are not new and that “the conclusions are erroneous or unfounded”. Also complaining about Lindley’s review of his book [suggesting prior integration could save the day] in Biometrika, where he was not allowed a rejoinder. It reflects on the then intense opposition between Bayesians and fiducialist Fisherians. (Funny enough, given the place of these marginalisation paradoxes in his book, I was mistakenly convinced that Jaynes was one of the discussants of this historical paper. He is mentioned in the reply by the authors.)

## a new paradigm for improper priors

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , on November 6, 2017 by xi'an

Gunnar Taraldsen and co-authors have arXived a short note on using improper priors from a new perspective. Generalising an earlier 2016 paper in JSPI on the same topic. Which both relate to a concept introduced by Rényi (who himself attributes the idea to Kolmogorov). Namely that random variables measures are to be associated with arbitrary measures [not necessarily σ-finite measures, the later defining σ-finite random variables], rather than those with total mass one. Which allows for an alternate notion of conditional probability in the case of σ-finite random variables, with the perk that this conditional probability distribution is itself of mass 1 (a.e.).  Which we know happens when moving from prior to proper posterior.

I remain puzzled by the 2016 paper though as I do not follow the meaning of a random variable associated with an infinite mass probability measure. If the point is limited to construct posterior probability distributions associated with improper priors, there is little value in doing so. The argument in the 2016 paper is however that one can then define a conditional distribution in marginalisation paradoxes à la Stone, Dawid and Zidek (1973) where the marginal does not exist. Solving with this formalism the said marginalisation paradoxes as conditional distributions are only defined for σ-finite random variables. Which gives a fairly different conclusion from either Stone, Dawid and Zidek (1973) [with whom I agree, namely that there is no paradox because there is no “joint” distribution] or Jaynes (1973) [with whom I less agree!, in that the use of an invariant measure to make the discrepancy go away is not a particularly strong argument in favour of this measure]. The 2016 paper also draws an interesting connection with the study by Jim Hobert and George Casella (in Jim’s thesis) of [null recurrent or transient] Gibbs samplers with no joint [proper] distribution. Which in some situations can produce proper subchains, a phenomenon later exhibited by Alan Gelfand and Sujit Sahu (and Xiao-Li Meng as well if I correctly remember!). But I see no advantage in following this formalism, as it does not impact whether the chain is transient or null recurrent, or anything connected with its implementation. Plus a link to the approximation of improper priors by sequences of proper ones by Bioche and Druihlet I discussed a while ago.

## covariant priors, Jeffreys and paradoxes

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on February 9, 2016 by xi'an

“If no information is available, π(α|M) must not deliver information about α.”

In a recent arXival apparently submitted to Bayesian Analysis, Giovanni Mana and Carlo Palmisano discuss of the choice of priors in metrology. Which reminded me of this meeting I attended at the Bureau des Poids et Mesures in Sèvres where similar debates took place, albeit being led by ferocious anti-Bayesians! Their reference prior appears to be the Jeffreys prior, because of its reparameterisation invariance.

“The relevance of the Jeffreys rule in metrology and in expressing uncertainties in measurements resides in the metric invariance.”

This, along with a second order approximation to the Kullback-Leibler divergence, is indeed one reason for advocating the use of a Jeffreys prior. I at first found it surprising that the (usually improper) prior is used in a marginal likelihood, as it cannot be normalised. A source of much debate [and of our alternative proposal].

“To make a meaningful posterior distribution and uncertainty assessment, the prior density must be covariant; that is, the prior distributions of different parameterizations must be obtained by transformations of variables. Furthermore, it is necessary that the prior densities are proper.”

The above quote is quite interesting both in that the notion of covariant is used rather than invariant or equivariant. And in that properness is indicated as a requirement. (Even more surprising is the noun associated with covariant, since it clashes with the usual notion of covariance!) They conclude that the marginal associated with an improper prior is null because the normalising constant of the prior is infinite.

“…the posterior probability of a selected model must not be null; therefore, improper priors are not allowed.”

Maybe not so surprisingly given this stance on improper priors, the authors cover a collection of “paradoxes” in their final and longest section: most of which makes little sense to me. First, they point out that the reference priors of Berger, Bernardo and Sun (2015) are not invariant, but this should not come as a surprise given that they focus on parameters of interest versus nuisance parameters. The second issue pointed out by the authors is that under Jeffreys’ prior, the posterior distribution of a given normal mean for n observations is a t with n degrees of freedom while it is a t with n-1 degrees of freedom from a frequentist perspective. This is not such a paradox since both distributions work in different spaces. Further, unless I am confused, this is one of the marginalisation paradoxes, which more straightforward explanation is that marginalisation is not meaningful for improper priors. A third paradox relates to a contingency table with a large number of cells, in that the posterior mean of a cell probability goes as the number of cells goes to infinity. (In this case, Jeffreys’ prior is proper.) Again not much of a bummer, there is simply not enough information in the data when faced with a infinite number of parameters. Paradox #4 is the Stein paradox, when estimating the squared norm of a normal mean. Jeffreys’ prior then leads to a constant bias that increases with the dimension of the vector. Definitely a bad point for Jeffreys’ prior, except that there is no Bayes estimator in such a case, the Bayes risk being infinite. Using a renormalised loss function solves the issue, rather than introducing as in the paper uniform priors on intervals, which require hyperpriors without being particularly compelling. The fifth paradox is the Neyman-Scott problem, with again the Jeffreys prior the culprit since the estimator of the variance is inconsistent. By a multiplicative factor of 2. Another stone in Jeffreys’ garden [of forking paths!]. The authors consider that the prior gives zero weight to any interval not containing zero, as if it was a proper probability distribution. And “solve” the problem by avoid zero altogether, which requires of course to specify a lower bound on the variance. And then introducing another (improper) Jeffreys prior on that bound… The last and final paradox mentioned in this paper is one of the marginalisation paradoxes, with a bizarre explanation that since the mean and variance μ and σ are not independent a posteriori, “the information delivered by x̄ should not be neglected”.