## why noninformative priors?

*Answering a question around this theme on StackExchange, I wrote the following reply:*

**T**he debate about non-informative priors has been going on for ages, at least since the end of the 19th century with criticisms by Bertrand and de Morgan about the lack of invariance of Laplace’s uniform priors (the same criticism reported by Stéphane Laurent in the above comments). This lack of invariance sounded like a death stroke for the Bayesian approach and, while some Bayesians were desperately trying to cling to specific distributions, using less-than-formal arguments, others had a wider vision of a larger picture where priors could be used in situations where there was hardly any prior information, beyond the shape of the likelihood itself. (This was even before Abraham Wald established his admissibility and complete class results about Bayes procedures. And at about the same time as E.J.G. Pitman gave an “objective” derivation of the best invariant estimator as a Bayes estimator against the corresponding Haar measure…)

**T**his vision is best represented by Jeffreys’ distributions, where the information matrix of the sampling model, , is turned into a prior distribution

which is most often improper, i.e. does not integrate to a finite value. The label “non-informative” associated with Jeffreys’ priors is rather unfortunate, as they represent an input from the statistician, hence are informative about something! Similarly, “objective” has an authoritative weight I dislike… I thus prefer the label “reference prior”, used for instance by José Bernado.

**T**hose priors indeed give a reference against which one can compute either the reference estimator/test/prediction or one’s own estimator/test/prediction using a different prior motivated by subjective and objective items of information. To answer directly the question, “*why not use only informative priors*?”, there is actually no answer. A prior distribution is a choice made by the statistician, neither a state of Nature nor a hidden variable. In other words, there is no “best prior” that one “should use”. Because this is the nature of statistical inference that there is no “best answer”.

**H**ence my defence of the noninformative/reference choice! It is providing the same range of inferential tools as other priors, but gives answers that are only inspired by the shape of the likelihood function, rather than induced by some opinion about the range of the unknown parameters.

May 20, 2012 at 3:51 pm

[…] why noninformative priors? […]

May 10, 2012 at 6:50 pm

R. Cox starts out assuming we have degrees of belief in any claims, and further, that they follow some questionable relationships. Voila! You get out just precisely what you put in!

May 9, 2012 at 9:58 am

Presumably Aris Spanos would not be convinced by that reasoning. Or indeed by calls for pluralism in these foundational issues.

May 9, 2012 at 10:01 am

Presumably. I am not trying to convince anyone, though. Or to convert anyone. Simply stating the reasons why I think it is a coherent perspective.

May 9, 2012 at 3:07 am

Why use any prior at all*? I think Fisher is persuasive: “If the justification for any particular form of [prior for θ] is merely that it makes no difference whether the form is right or wrong, we may well ask what the expression is doing in our reasoning at all”. http://errorstatistics.com/2012/02/17/two-new-properties-of-mathematical-likelihood/

*In making inferences about hypotheses without a physical/frequentist prior distribution.

May 9, 2012 at 6:29 am

I give a collection of reasons for using non-informative priors in The Bayesian Choice (Section 3.5). My favourite is Wald’s complete class theorems, namely that all admissible statistical procedures are Bayes procedures or limits of Bayes procedures.

May 9, 2012 at 9:24 am

well, that is what I thought too fo r a long time. but what convinced me is Cox’s theorem. have a look a E.T. Jaynes book Probability as Logic of Science.