## priors without likelihoods are like sloths without…

“The idea of building priors that generate reasonable data may seem like an unusual idea…”

**A**ndrew, Dan, and Michael arXived a opinion piece last week entitled “The prior can generally only be understood in the context of the likelihood”. Which connects to the earlier Read Paper of Gelman and Hennig I discussed last year. I cannot state strong disagreement with the positions taken in this piece, actually, in that I do not think prior distributions ever occur as *a given* but are rather chosen as a reference measure to probabilise the parameter space and eventually prioritise regions over others. If anything I find myself even further on the prior agnosticism gradation. (Of course, this lack of disagreement applies to the likelihood understood as a function of both the data and the parameter, rather than of the parameter only, conditional on the data. Priors cannot be depending on the data without incurring disastrous consequences!)

“…it contradicts the conceptual principle that the prior distribution should convey only information that is available before the data have been collected.”

The first example is somewhat disappointing in that it revolves as so many Bayesian textbooks (since Laplace!) around the [sex ratio] Binomial probability parameter and concludes at the strong or long-lasting impact of the Uniform prior. I do not see much of a contradiction between the use of a Uniform prior and the collection of prior information, if only because there is not standardised way to transfer prior information into prior construction. And more fundamentally because a parameter rarely makes sense by itself, alone, without a model that relates it to potential data. As for instance in a regression model. More, following my epiphany of last semester, about the relativity of the prior, I see no damage in the prior being relevant, as I only attach a *relative* meaning to statements based on the posterior. Rather than trying to limit the impact of a prior, we should rather build assessment tools to measure this impact, for instance by prior predictive simulations. And this is where I come to quite agree with the authors.

“…non-identifiabilities, and near nonidentifiabilites, of complex models can lead to unexpected amounts of weight being given to certain aspects of the prior.”

Another rather straightforward remark is that non-identifiable models see the impact of a prior remain as the sample size grows. And I still see no issue with this fact in a relative approach. When the authors mention (p.7) that purely mathematical priors perform more poorly than weakly informative priors it is hard to see what they mean by this “performance”.

“…judge a prior by examining the data generating processes it favors and disfavors.”

Besides those points, I completely agree with them about the fundamental relevance of the prior as a generative process, only when the likelihood becomes available. And simulatable. (This point is found in many references, including our response to the American Statistician paper *Hidden dangers of specifying noninformative priors*, with Kaniav Kamary. With the same illustration on a logistic regression.) I also agree to their criticism of the marginal likelihood and Bayes factors as being so strongly impacted by the choice of a prior, if treated as absolute quantities. I also if more reluctantly and somewhat heretically see a point in using the posterior predictive for assessing whether a prior is relevant for the data at hand. At least at a conceptual level. I am however less certain about how to handle improper priors based on their recommendations. In conclusion, it would be great to see one [or more] of the authors at O-Bayes 2017 in Austin as I am sure it would stem nice discussions there! (And by the way I have no prior idea on how to conclude the comparison in the title!)

September 11, 2017 at 10:30 pm

Thanks for your comments Christian!

Some responses in no particular order:

– I will be at O’Bayes, but I don’t think I’m presenting anything (this would make a very bad poster and I wasn’t asked to talk), but I’m sure people will notice I’m there.

– How to handle improper priors based on our recommendations: don’t use improper priors. They are completely incompatible with the idea of generative modelling.

– I really like the idea of using the posterior predictive to check your prior. There is probably a tighter link that can be made with the prior-data conflict literature. But even if you don’t like it, one way to see it is that the posterior predictive is the best that you can do with all of the information (data + model) at hand. If in that case you still can’t predict new data (or pseudo-new data), then something is wrong with your model [NB: Not necessarily the prior!].

– I don’t completely agree with you about priors being unable to depend on the data without disastrous consequences. I think it’s all about working out how to mitigate the problems. It also ignores the problem that often the *likelihood* is constructed with knowledge of the data (which should be just as bad). I think there’s a need to formalise this type of process and work out how to do it safely and specifically what you can and cannot do. For instance, you obviously can’t use BvM-type arguments if you have a data-dependent prior. The other paper we wrote (nominally about visualisation) has a bit of this in it, as does the associated blog post [and it’s many comments](http://andrewgelman.com/2017/09/07/touch-want-feel-data/).

– I like your idea of relativity, but it really just moves the problem it a different place. Your posterior is interpreted relative to your prior that is interpreted relative to reality. As you say, this really gives a much better use of the prior predictive than just computing marginal likelihoods. In the other paper (https://arxiv.org/abs/1709.01449) we argue briefly that you can use the prior predictive to get a notion of how informative your generative model is. This means you can talk about generative models being “weakly informative”.

September 11, 2017 at 10:36 pm

Dan@OB17: I am sure as well people will notice!!! See you in Austin.

September 11, 2017 at 11:46 pm

I’m less okay with using the data to inform the posteriors than Dan, although I am more and more begrudgingly accepting its practical reality. I mean, how doesn’t standardize the covariates before building a regression and specifying priors? That’s using the data twice in a way that large has more positive effects than negative effects. What we can agree on is that we definitely need to understand it better.

Also, I wonder what could be learned about prior/likelihood tensions by comparing the prior predictive and posterior predictive — is there a reason why that would be more interpretable than looking at prior/posterior comparisons?

September 12, 2017 at 9:28 am

When using the data twice (agreeing that this notion can be utterly confusing!), there must be a garde-fou of sorts against over-fitting. Which forces us to seek it outside the standard B theory…

September 12, 2017 at 7:03 pm

I totally agree that you need to go outside pure Bayes theory for this. But I think it will a refinement of the Bayesian argument rather than a new universe.

September 12, 2017 at 9:32 am

Ah improper priors, “unique objet de mon ressentiment” for some! But I am still waiting for The Big One, the B argument that would Bayxit them from the scene…

September 12, 2017 at 7:04 pm

Whereas I’ve just never seen a convincing argument for them!