Here is a short discussion I wrote yesterday with Judith Rousseau of a paper by Peter Müller and Riten Mitra to appear in Bayesian Analysis.
“We congratulate the authors for this very pleasant overview of the type of problems that are currently tackled by Bayesian nonparametric inference and for demonstrating how prolific this field has become. We do share the authors viewpoint that many Bayesian nonparametric models allow for more flexible modelling than parametric models and thus capture finer details of the data. BNP can be a good alternative to complex parametric models in the sense that the computations are not necessarily more difficult in Bayesian nonparametric models. However we would like to mitigate the enthusiasm of the authors since, although we believe that Bayesian nonparametric has proved extremely useful and interesting, we think they oversell the “nonparametric side of the Force”! Our main point is that by definition, Bayesian nonparametric is based on prior probabilities that live on infinite dimensional spaces and thus are never completely swamped by the data. It is therefore crucial to understand which (or why!) aspects of the model are strongly influenced by the prior and how.
As an illustration, when looking at Example 1 with the censored zeroth cell, our reaction is that this is a problem with no proper solution, because it is lacking too much information. In other words, unless some parametric structure of the model is known, in which case the zeroth cell is related with the other cells, we see no way to infer about the size of this cell. The outcome produced by the authors is therefore unconvincing to us in that it seems to only reflect upon the prior modelling (α,G*) and not upon the information contained in the data. Now, this prior modelling may be to some extent justified based on side information about the medical phenomenon under study, however its impact on the resulting inference is palatable.
Recently (and even less recently) a few theoretical results have pointed out this very issue. E.g., Diaconis and Freedman (1986) showed that some priors could surprisingly lead to inconsistent posteriors, even though it was later shown that many priors lead to consistent posteriors and often even to optimal asymptotic frequentist estimators, see for instance van der Vaart and van Zanten (2009) and Kruijer et al. (2010). The worry about Bayesian nonparametrics truly appeared when considering (1) asymptotic frequentist properties of semi-parametric procedures; and (2) interpretation of inferential aspects of Bayesian nonparametric procedures. It was shown in various instances that some nonparametric priors which behaved very nicely for the estimation of the whole parameter could have disturbingly suboptimal behaviour for some specific functionals of interest, see for instance Arbel et al. (2013) and Rivoirard and Rousseau (2012). We do not claim here that asymptotics is the answer to everything however bad asymptotic behaviour shows that something wrong is going on and this helps understanding the impact of the prior. These disturbing bad results are an illustration that in these infinite dimensional models the impact of the prior modelling is difficult to evaluate and that although the prior looks very flexible it can in fact be highly informative and/or restrictive for some aspects of the parameter. It would thus be wrong to conclude that every aspect of the parameter is well-recovered because some are. It has been a well-known fact for Bayesian parametric models, leading to extensive research on reference and other types of objective priors. It is even more crucial in the nonparametric world. No (nonparametric) prior can be suited for every inferential aspect and it is important to understand which aspects of the parameter are well-recovered and which ones are not.
We also concur with the authors that Dirichlet mixture priors provide natural clustering mechanisms, but one may question the “natural” label as the resulting clustering is quite unstructured, growing in the number of clusters as the number of observations increases and not incorporating any prior constraint on the “definition” of a cluster, except the one implicit and well-hidden behind the non-parametric prior. In short, it is delicate to assess what is eventually estimated by this clustering methods.
These remarks are not to be taken criticisms of the overall Bayesian nonparametric approach, just the contrary. We simply emphasize (or recall) that there is no such thing as a free lunch and that we need to post the price to pay for potential customers. In these models, this is far from easy and just as far from being completed.”
Arbel, J., Gayraud, G., and Rousseau, J. (2013). Bayesian adaptive optimal estimation using a sieve prior. Scandinavian Journal of Statistics, to appear.
Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14:1-26.
Kruijer, W., Rousseau, J., and van der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures. Electron. J. Stat., 4:1225-1257.
Rivoirard, V. and Rousseau, J. (2012). On the Bernstein Von Mises theorem for linear functionals of the density. Ann. Statist., 40:1489-1523.
van der Vaart, A. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann. Statist., 37:2655-2675.