Dirichlet process mixture inconsistency
Judith Rousseau pointed out to me this NIPS paper by Jeff Miller and Matthew Harrison on the possible inconsistency of Dirichlet mixtures priors for estimating the (true) number of components in a (true) mixture model. The resulting posterior on the number of components does not concentrate on the right number of components. Which is not the case when setting a prior on the unknown number of components of a mixture, where consistency occurs. (The inconsistency results established in the paper are actually focussed on iid Gaussian observations, for which the estimated number of Gaussian components is almost never equal to 1.) In a more recent arXiv paper, they also show that a Dirichlet prior on the weights and a prior on the number of components can still produce the same features as a Dirichlet mixtures priors. Even the stick breaking representation! (Paper that I already reviewed last Spring.)
February 16, 2016 at 12:09 am
The inconsistency seems to be fairly well known amongst heavy dirichlet process users; anecdotally, when Andy Roth gave his talk in Oxford a few weeks ago (on DPs for genetic analysis) he self-acknowledged this possible criticism before I even had the chance to trollishly point it out!
Possibly the Miller & Harrison paper is responsible for this awareness …
February 16, 2016 at 9:42 pm
I am actually not surprised at all as this is a different model with a priori an infinite number of components, so little reason to believe the number of clusters can converge to the “true” value, whatever that means… When Kate Lee and I worked on a computational construct for the Bayes factor on the number of components, I was very much surprised we were asked by a referee to draw a comparison with a Dirichlet mixture approach…