Following [time-wise] the AISTATS 2016 meeting, a machine learning school is organised in Cádiz (as is the tradition for AISTATS meetings in Europe, i.e., in even years). With an impressive [if downright scary] poster! There is no strong statistics component in the programme, apart from a course by Tamara Broderick on non-parametric Bayes, but the list of speakers is impressive and the ten day school is worth recommending for all interested students. (I remember giving a short course at MLSS 2004 on Berder Island in Brittany, with the immediate reward of running the Auray-Vannes half-marathon that year…) The deadline for applications is March 25, 2016.
Archive for Bayesian non-parametrics
As announced at the 60th ISI World Meeting in Rio de Janeiro, my friend, co-author, and former PhD student Judith Rousseau got the first Ethel Newbold Prize! Congrats, Judith! And well-deserved! The prize is awarded by the Bernoulli Society on the following basis
The Ethel Newbold Prize is to be awarded biannually to an outstanding statistical scientist for a body of work that represents excellence in research in mathematical statistics, and/or excellence in research that links developments in a substantive field to new advances in statistics. In any year in which the award is due, the prize will not be awarded unless the set of all nominations includes candidates from both genders.
and is funded by Wiley. I support very much this (inclusive) approach of “recognizing the importance of women in statistics”, without creating a prize restricted to women nominees (and hence exclusive). Thanks to the members of the Program Committee of the Bernoulli Society for setting that prize and to Nancy Reid in particular.
Ethel Newbold was a British statistician who worked during WWI in the Ministry of Munitions and then became a member of the newly created Medical Research Council, working on medical and industrial studies. She was the first woman to receive the Guy Medal in Silver in 1928. Just to stress that much remains to be done towards gender balance, the second and last woman to get a Guy Medal in Silver is Sylvia Richardson, in 2009… (In addition, Valerie Isham, Nicky Best, and Fiona Steele got a Guy Medal in Bronze, out of the 71 so far awarded, while no woman ever got a Guy Medal in Gold.) Funny occurrences of coincidence: Ethel May Newbold was educated at Tunbridge Wells, the place where Bayes was a minister, while Sylvia is now head of the Medical Research Council biostatistics unit in Cambridge.
[Here is a call for a special issue on Bayesian nonparametrics, edited by Alessio Benavoli , Antonio Lijoi and Antonietta Mira, for an Elsevier journal I had never heard of previously:]
The International Journal of Approximate Reasoning is pleased to announce a special issue on “Bayesian Nonparametrics”. The submission deadline is *December 1st*, 2015.
The aim of this Special Issue is twofold. First, it is to give a broad overview of the most popular models used in BNP and their application in
Artificial Intelligence, by means of tutorial papers. Second, the Special Issue will focus on theoretical advances and challenging applications of BNP with special emphasis on the following aspects:
- Methodological and theoretical developments of BNP
- Treatment of imprecision and uncertainty with/in BNP methods
- Formal applications of BNP methods to novel applied problems
- New computational and simulation tools for BNP inference.
“From a Bayesian perspective, perhaps the most natural approach is to treat the numberof components like any other unknown parameter and put a prior on it.”
Another mixture paper on arXiv! Indeed, Jeffrey Miller and Matthew Harrison recently arXived a paper on estimating the number of components in a mixture model, comparing the parametric with the non-parametric Dirichlet prior approaches. Since priors can be chosen towards agreement between those. This is an obviously interesting issue, as they are often opposed in modelling debates. The above graph shows a crystal clear agreement between finite component mixture modelling and Dirichlet process modelling. The same happens for classification. However, Dirichlet process priors do not return an estimate of the number of components, which may be considered a drawback if one considers this is an identifiable quantity in a mixture model… But the paper stresses that the number of estimated clusters under the Dirichlet process modelling tends to be larger than the number of components in the finite case. Hence that the Dirichlet process mixture modelling is not consistent in that respect, producing parasite extra clusters…
In the parametric modelling, the authors assume the same scale is used in all Dirichlet priors, that is, for all values of k, the number of components. Which means an incoherence when marginalising from k to (k-p) components. Mild incoherence, in fact, as the parameters of the different models do not have to share the same priors. And, as shown by Proposition 3.3 in the paper, this does not prevent coherence in the marginal distribution of the latent variables. The authors also draw a comparison between the distribution of the partition in the finite mixture case and the Chinese restaurant process associated with the partition in the infinite case. A further analogy is that the finite case allows for a stick breaking representation. A noteworthy difference between both modellings is about the size of the partitions
in the finite (homogeneous partitions) and infinite (extreme partitions) cases.
An interesting entry into the connections between “regular” mixture modelling and Dirichlet mixture models. Maybe not ultimately surprising given the past studies by Peter Green and Sylvia Richardson of both approaches (1997 in Series B and 2001 in JASA).
Today, I attended a “miniworkshop” on Bayesian nonparametrics in Paris (Université René Descartes, now located in an intensely renovated area near the Grands Moulins de Paris), in connection with one of the ANR research grants that support my research, BANHDITS in the present case. Reflecting incidentally that it was the third Monday in a row that I was at a meeting listening to talks (after Hong Kong and Newcastle)… The talks were as follows
9h30 – 10h15 : Dominique Bontemps/Sébastien Gadat
Bayesian point of view on the Shape Invariant Model
10h15 – 11h : Pierpaolo De Blasi
Posterior consistency of nonparametric location-scale mixtures for multivariate density estimation
11h30 – 12h15 : Jean-Bernard Salomond
General posterior contraction rate Theorem in inverse problems.
12h15 – 13h : Eduard Belitser
On lower bounds for posterior consistency (I)
14h30 – 15h15 : Eduard Belitser
On lower bounds for posterior consistency (II)
15h15 – 16h : Judith Rousseau
Posterior concentration rates for empirical Bayes approaches
16h – 16h45 : Elisabeth Gassiat
Nonparametric HMM models
While most talks were focussing on contraction and consistency rates, hence far from my current interests, both talk by Judith and Elisabeth held more appeal to me. Judith gave conditions for an empirical Bayes nonparametric modelling to be consistent, with examples taken from Peter Green’s mixtures of Dirichlet, and Elisabeth concluded with a very generic result on the consistent estimation of a finite hidden Markov model. (Incidentally, the same BANHDITS grant will also support the satellite meeting on Bayesian non-parametric at MCMSki IV on Jan. 09.)
More than a year ago Michael Sørensen (2013 EMS Chair) and Fabrizzio Ruggeri (then ISBA President) kindly offered me to deliver the memorial lecture on Thomas Bayes at the 2013 European Meeting of Statisticians, which takes place in Budapest today and the following week. I gladly accepted, although with some worries at having to cover a much wider range of the field rather than my own research topic. And then set to work on the slides in the past week, borrowing from my most “historical” lectures on Jeffreys and Keynes, my reply to Spanos, as well as getting a little help from my nonparametric friends (yes, I do have nonparametric friends!). Here is the result, providing a partial (meaning both incomplete and biased) vision of the field.
Since my talk is on Thursday, and because the talk is sponsored by ISBA, hence representing its members, please feel free to comment and suggest changes or additions as I can still incorporate them into the slides… (Warning, I purposefully kept some slides out to preserve the most surprising entry for the talk on Thursday!)
Here is the abstract of a recently arXived paper that attracted my attention:
Although it is known that Bayesian estimators may be inconsistent if the model is misspecified, it is also a popular belief that a “good” or “close” enough model should have good convergence properties. This paper shows that, contrary to popular belief, there is no such thing as a “close enough” model in Bayesian inference in the following sense: we derive optimal lower and upper bounds on posterior values obtained from models that exactly capture an arbitrarily large number of finite-dimensional marginals of the data-generating distribution and/or that are arbitrarily close to the data-generating distribution in the Prokhorov or total variation metrics; these bounds show that such models may still make the largest possible prediction error after conditioning on an arbitrarily large number of sample data. Therefore, under model misspecification, and without stronger assumptions than (arbitrary) closeness in Prokhorov or total variation metrics, Bayesian inference offers no better guarantee of accuracy than arbitrarily picking a value between the essential infimum and supremum of the quantity of interest. In particular, an unscrupulous practitioner could slightly perturb a given prior and model to achieve any desired posterior conclusions.ink
The paper is both too long and too theoretical for me to get into it deep enough. The main point however is that, given the space of all possible measures, the set of (parametric) Bayes inferences constitutes a tiny finite-dimensional set that may lie far far away from the true model. I do not find the result unreasonable, far from it!, but the fact that Bayesian (and other) inferences may be inconsistent for most misspecified models is not such a major issue in my opinion. (Witness my post on the Robins-Wasserman paradox.) I am not so much convinced either about this “popular belief that a “good” or “close” enough model should have good convergence properties”, as it is intuitively reasonable that the immensity of the space of all models can induce non-convergent behaviours. The statistical question is rather what can be done about it. Does it matter that the model is misspecified? If it does, is there any meaning in estimating parameters without a model? For a finite sample size, should we at all bother that the model is not “right” or “close enough” if discrepancies cannot be detected at this precision level? I think the answer to all those questions is negative and that we should proceed with our imperfect models and imperfect inference as long as our imperfect simulation tools do not exhibit strong divergences.