why noninformative priors?

Answering a question around this theme on StackExchange, I wrote the following reply:

The debate about non-informative priors has been going on for ages, at least since the end of the 19th century with criticisms by Bertrand and de Morgan about the lack of invariance of Laplace’s uniform priors (the same criticism reported by Stéphane Laurent in the above comments). This lack of invariance sounded like a death stroke for the Bayesian approach and, while some Bayesians were desperately trying to cling to specific distributions, using less-than-formal arguments, others had a wider vision of a larger picture where priors could be used in situations where there was hardly any prior information, beyond the shape of the likelihood itself. (This was even before Abraham Wald established his admissibility and complete class results about Bayes procedures. And at about the same time as E.J.G. Pitman gave an “objective” derivation of the best invariant estimator as a Bayes estimator against the corresponding Haar measure…)

This vision is best represented by Jeffreys’ distributions, where the information matrix of the sampling model, $I(\theta)$, is turned into a prior distribution

$\pi(\theta) \propto |I(\theta)|^{1/2}$

which is most often improper, i.e. does not integrate to a finite value. The label “non-informative” associated with Jeffreys’ priors is rather unfortunate, as they represent an input from the statistician, hence are informative about something! Similarly, “objective” has an authoritative weight I dislike… I thus prefer the label “reference prior”, used for instance by José Bernado.

Those priors indeed give a reference against which one can compute either the reference estimator/test/prediction or one’s own estimator/test/prediction using a different prior motivated by subjective and objective items of information. To answer directly the question, “why not use only informative priors?”, there is actually no answer. A prior distribution is a choice made by the statistician, neither a state of Nature nor a hidden variable. In other words, there is no “best prior” that one “should use”. Because this is the nature of statistical inference that there is no “best answer”.

Hence my defence of the noninformative/reference choice! It is providing the same range of inferential tools as other priors, but gives answers that are only inspired by the shape of the likelihood function, rather than induced by some opinion about the range of the unknown parameters.

7 Responses to “why noninformative priors?”

1. […] why noninformative priors? […]

2. R. Cox starts out assuming we have degrees of belief in any claims, and further, that they follow some questionable relationships. Voila! You get out just precisely what you put in!

3. Presumably Aris Spanos would not be convinced by that reasoning. Or indeed by calls for pluralism in these foundational issues.

• Presumably. I am not trying to convince anyone, though. Or to convert anyone. Simply stating the reasons why I think it is a coherent perspective.

4. Why use any prior at all*? I think Fisher is persuasive: “If the justification for any particular form of [prior for θ] is merely that it makes no difference whether the form is right or wrong, we may well ask what the expression is doing in our reasoning at all”. http://errorstatistics.com/2012/02/17/two-new-properties-of-mathematical-likelihood/
*In making inferences about hypotheses without a physical/frequentist prior distribution.

• I give a collection of reasons for using non-informative priors in The Bayesian Choice (Section 3.5). My favourite is Wald’s complete class theorems, namely that all admissible statistical procedures are Bayes procedures or limits of Bayes procedures.

• jaynesian Says:

well, that is what I thought too fo r a long time. but what convinced me is Cox’s theorem. have a look a E.T. Jaynes book Probability as Logic of Science.

This site uses Akismet to reduce spam. Learn how your comment data is processed.