## Approximate reasoning on Bayesian nonparametrics

Posted in Books, Statistics, University life with tags , , on July 7, 2015 by xi'an

[Here is a call for a special issue on Bayesian nonparametrics, edited by Alessio Benavoli , Antonio Lijoi and Antonietta Mira, for an Elsevier journal I had never heard of previously:]

The International Journal of Approximate Reasoning is pleased to announce a special issue on “Bayesian Nonparametrics”. The submission deadline is *December 1st*, 2015.

The aim of this Special Issue is twofold. First, it is to give a broad overview of the most popular models used in BNP and their application in
Artificial Intelligence, by means of tutorial papers. Second, the Special Issue will focus on theoretical advances and challenging applications of BNP with special emphasis on the following aspects:

• Methodological and theoretical developments of BNP
• Treatment of imprecision and uncertainty with/in BNP methods
• Formal applications of BNP methods to novel applied problems
• New computational and simulation tools for BNP inference.

## mixture models with a prior on the number of components

Posted in Books, Statistics, University life with tags , , , , , , , on March 6, 2015 by xi'an

“From a Bayesian perspective, perhaps the most natural approach is to treat the numberof components like any other unknown parameter and put a prior on it.”

Another mixture paper on arXiv! Indeed, Jeffrey Miller and Matthew Harrison recently arXived a paper on estimating the number of components in a mixture model, comparing the parametric with the non-parametric Dirichlet prior approaches. Since priors can be chosen towards agreement between those. This is an obviously interesting issue, as they are often opposed in modelling debates. The above graph shows a crystal clear agreement between finite component mixture modelling and Dirichlet process modelling. The same happens for classification.  However, Dirichlet process priors do not return an estimate of the number of components, which may be considered a drawback if one considers this is an identifiable quantity in a mixture model… But the paper stresses that the number of estimated clusters under the Dirichlet process modelling tends to be larger than the number of components in the finite case. Hence that the Dirichlet process mixture modelling is not consistent in that respect, producing parasite extra clusters…

In the parametric modelling, the authors assume the same scale is used in all Dirichlet priors, that is, for all values of k, the number of components. Which means an incoherence when marginalising from k to (k-p) components. Mild incoherence, in fact, as the parameters of the different models do not have to share the same priors. And, as shown by Proposition 3.3 in the paper, this does not prevent coherence in the marginal distribution of the latent variables. The authors also draw a comparison between the distribution of the partition in the finite mixture case and the Chinese restaurant process associated with the partition in the infinite case. A further analogy is that the finite case allows for a stick breaking representation. A noteworthy difference between both modellings is about the size of the partitions

$\mathbb{P}(s_1,\ldots,s_k)\propto\prod_{j=1}^k s_j^{-\gamma}\quad\text{versus}\quad\mathbb{P}(s_1,\ldots,s_k)\propto\prod_{j=1}^k s_j^{-1}$

in the finite (homogeneous partitions) and infinite (extreme partitions) cases.

An interesting entry into the connections between “regular” mixture modelling and Dirichlet mixture models. Maybe not ultimately surprising given the past studies by Peter Green and Sylvia Richardson of both approaches (1997 in Series B and 2001 in JASA).

## mini Bayesian nonparametrics in Paris

Posted in pictures, Statistics, University life with tags , , , , , on September 10, 2013 by xi'an

Today, I attended a “miniworkshop” on Bayesian nonparametrics in Paris (Université René Descartes, now located in an intensely renovated area near the Grands Moulins de Paris), in connection with one of the ANR research grants that support my research, BANHDITS in the present case. Reflecting incidentally that it was the third Monday in a row that I was at a meeting listening to talks (after Hong Kong and Newcastle)… The talks were as follows

9h30 – 10h15 : Dominique Bontemps/Sébastien Gadat
Bayesian point of view on the Shape Invariant Model
10h15 – 11h : Pierpaolo De Blasi
Posterior consistency of nonparametric location-scale mixtures for multivariate density estimation
11h30 – 12h15 : Jean-Bernard Salomond
General posterior contraction rate Theorem in inverse problems.
12h15 – 13h : Eduard Belitser
On lower bounds for posterior consistency (I)
14h30 – 15h15 : Eduard Belitser
On lower bounds for posterior consistency (II)
15h15 – 16h : Judith Rousseau
Posterior concentration rates for empirical Bayes approaches
16h – 16h45 : Elisabeth Gassiat
Nonparametric HMM models

While most talks were focussing on contraction and consistency rates, hence far from my current interests, both talk by Judith and Elisabeth held more appeal to me. Judith gave conditions for an empirical Bayes nonparametric modelling to be consistent, with examples taken from Peter Green’s mixtures of Dirichlet, and Elisabeth concluded with a very generic result on the consistent estimation of a finite hidden Markov model. (Incidentally, the same BANHDITS grant will also support the satellite meeting on Bayesian non-parametric at MCMSki IV on Jan. 09.)

## Bayes 250th versus Bayes 2.5.0.

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 20, 2013 by xi'an

More than a year ago Michael Sørensen (2013 EMS Chair) and Fabrizzio Ruggeri (then ISBA President) kindly offered me to deliver the memorial lecture on Thomas Bayes at the 2013 European Meeting of Statisticians, which takes place in Budapest today and the following week. I gladly accepted, although with some worries at having to cover a much wider range of the field rather than my own research topic. And then set to work on the slides in the past week, borrowing from my most “historical” lectures on Jeffreys and Keynes, my reply to Spanos, as well as getting a little help from my nonparametric friends (yes, I do have nonparametric friends!). Here is the result, providing a partial (meaning both incomplete and biased) vision of the field.

Since my talk is on Thursday, and because the talk is sponsored by ISBA, hence representing its members, please feel free to comment and suggest changes or additions as I can still incorporate them into the slides… (Warning, I purposefully kept some slides out to preserve the most surprising entry for the talk on Thursday!)

## Bayesian brittleness

Posted in Statistics with tags , , , , , on May 3, 2013 by xi'an

Here is the abstract of a recently arXived paper that attracted my attention:

Although it is known that Bayesian estimators may be inconsistent if the model is misspecified, it is also a popular belief that a “good” or “close” enough model should have good convergence properties. This paper shows that, contrary to popular belief, there is no such thing as a “close enough” model in Bayesian inference in the following sense: we derive optimal lower and upper bounds on posterior values obtained from models that exactly capture an arbitrarily large number of finite-dimensional marginals of the data-generating distribution and/or that are arbitrarily close to the data-generating distribution in the Prokhorov or total variation metrics; these bounds show that such models may still make the largest possible prediction error after conditioning on an arbitrarily large number of sample data. Therefore, under model misspecification, and without stronger assumptions than (arbitrary) closeness in Prokhorov or total variation metrics, Bayesian inference offers no better guarantee of accuracy than arbitrarily picking a value between the essential infimum and supremum of the quantity of interest. In particular, an unscrupulous practitioner could slightly perturb a given prior and model to achieve any desired posterior conclusions.ink

The paper is both too long and too theoretical for me to get into it deep enough. The main point however is that, given the space of all possible measures, the set of (parametric) Bayes inferences constitutes a tiny finite-dimensional that may lie far far away from the true model. I do not find the result unreasonable, far from it!, but the fact that Bayesian (and other) inferences may be inconsistent for most misspecified models is not such a major issue in my opinion. (Witness my post on the Robins-Wasserman paradox.) I am not so much convinced either about this “popular belief that a “good” or “close” enough model should have good convergence properties”, as it is intuitively reasonable that the immensity of the space of all models can induce non-convergent behaviours. The statistical question is rather what can be done about it. Does it matter that the model is misspecified? If it does, is there any meaning in estimating parameters without a model? For a finite sample size, should we at all bother that the model is not “right” or “close enough” if discrepancies cannot be detected at this precision level? I think the answer to all those questions is negative and that we should proceed with our imperfect models and imperfect inference as long as our imperfect simulation tools do not exhibit strong divergences.

## Bayesian non-parametrics

Posted in Statistics with tags , , , , , , , , , , , on April 8, 2013 by xi'an

Here is a short discussion I wrote yesterday with Judith Rousseau of a paper by Peter Müller and Riten Mitra to appear in Bayesian Analysis.

“We congratulate the authors for this very pleasant overview of the type of problems that are currently tackled by Bayesian nonparametric inference and for demonstrating how prolific this field has become. We do share the authors viewpoint that many Bayesian nonparametric models allow for more flexible modelling than parametric models and thus capture finer details of the data. BNP can be a good alternative to complex parametric models in the sense that the computations are not necessarily more difficult in Bayesian nonparametric models. However we would like to mitigate the enthusiasm of the authors since, although we believe that Bayesian nonparametric has proved extremely useful and interesting, we think they oversell the “nonparametric side of the Force”! Our main point is that by definition, Bayesian nonparametric is based on prior probabilities that live on infinite dimensional spaces and thus are never completely swamped by the data. It is therefore crucial to understand which (or why!) aspects of the model are strongly influenced by the prior and how.

As an illustration, when looking at Example 1 with the censored zeroth cell, our reaction is that this is a problem with no proper solution, because it is lacking too much information. In other words, unless some parametric structure of the model is known, in which case the zeroth cell is related with the other cells, we see no way to infer about the size of this cell. The outcome produced by the authors is therefore unconvincing to us in that it seems to only reflect upon the prior modelling (α,G*) and not upon the information contained in the data. Now, this prior modelling may be to some extent justified based on side information about the medical phenomenon under study, however its impact on the resulting inference is palatable.

Recently (and even less recently) a few theoretical results have pointed out this very issue. E.g., Diaconis and Freedman (1986) showed that some priors could surprisingly lead to inconsistent posteriors, even though it was later shown that many priors lead to consistent posteriors and often even to optimal asymptotic frequentist estimators, see for instance van der Vaart and van Zanten (2009) and Kruijer et al. (2010). The worry about Bayesian nonparametrics truly appeared when considering (1) asymptotic frequentist properties of semi-parametric procedures; and (2) interpretation of inferential aspects of Bayesian nonparametric procedures. It was shown in various instances that some nonparametric priors which behaved very nicely for the estimation of the whole parameter could have disturbingly suboptimal behaviour for some specific functionals of interest, see for instance Arbel et al. (2013) and Rivoirard and Rousseau (2012). We do not claim here that asymptotics is the answer to everything however bad asymptotic behaviour shows that something wrong is going on and this helps understanding the impact of the prior. These disturbing bad results are an illustration that in these infinite dimensional models the impact of the prior modelling is difficult to evaluate and that although the prior looks very flexible it can in fact be highly informative and/or restrictive for some aspects of the parameter. It would thus be wrong to conclude that every aspect of the parameter is well-recovered because some are. It has been a well-known fact for Bayesian parametric models, leading to extensive research on reference and other types of objective priors. It is even more crucial in the nonparametric world. No (nonparametric) prior can be suited for every inferential aspect and it is important to understand which aspects of the parameter are well-recovered and which ones are not.

We also concur with the authors that Dirichlet mixture priors provide natural clustering mechanisms, but one may question the “natural” label as the resulting clustering is quite unstructured, growing in the number of clusters as the number of observations increases and not incorporating any prior constraint on the “definition” of a cluster, except the one implicit and well-hidden behind the non-parametric prior. In short, it is delicate to assess what is eventually estimated by this clustering methods.

These remarks are not to be taken criticisms of the overall Bayesian nonparametric approach, just the contrary. We simply emphasize (or recall) that there is no such thing as a free lunch and that we need to post the price to pay for potential customers. In these models, this is far from easy and just as far from being completed.”

References

• Arbel, J., Gayraud, G., and Rousseau, J. (2013). Bayesian adaptive optimal estimation using a sieve prior. Scandinavian Journal of Statistics, to appear.

• Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14:1-26.

• Kruijer, W., Rousseau, J., and van der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures. Electron. J. Stat., 4:1225-1257.

• Rivoirard, V. and Rousseau, J. (2012). On the Bernstein Von Mises theorem for linear functionals of the density. Ann. Statist., 40:1489-1523.

• van der Vaart, A. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann. Statist., 37:2655-2675.