## objectivity in prior distributions for the multinomial model

**T**oday, Danilo Alvares visiting from the Universitat de Valencià gave a talk at CREST about choosing a prior for the Multinomial distribution. Comparing different Dirichlet priors. In a sense this is an hopeless task, first because there is no reason to pick a particular prior unless one picks a very specific and a-Bayesian criterion to discriminate between priors, second because the multinomial is a weird distribution, hardly a distribution at all in that it results from grouping observations into classes, often based on the observations themselves. A construction that should be included within the choice of the prior maybe? But there lurks a danger of ending up with a data-dependent prior. My other remark about this problem is that, among the token priors, Perk’s prior using 1/k as its hyper-parameter [where k is the number of categories] is rather difficult to justify compared with 1/k² or 1/k³, except for aggregation consistency to some extent. And Laplace’s prior gets highly concentrated as the number of categories grows.

March 17, 2016 at 9:23 pm

Perks prior on the Dirichlet parameter is unique in that it maximizes the variance of entropy of the distribution sampled from that Dirichlet. This was discovered by Nemenman, Shafee, and Bialek in 2001, http://arxiv.org/abs/physics/0108025.

In the family of symmetric Dirichlet mixture priors there are interesting non-informative choices, such as the NSB prior. See also http://www.nowozin.net/sebastian/blog/estimating-discrete-entropy-part-3.html for some consequences of these prior choices in the context of the entropy estimation problem.