Archive for Bayesian non-parametrics

Imperial postdoc in Bayesian nonparametrics

Posted in pictures, R with tags , , , , , , , , on April 27, 2018 by xi'an

Here is another announcement for a post-doctoral position in London (UK) to work with Sarah Filippi. In the Department of Mathematics at Imperial College London. (More details on the site or in this document. Hopefully, the salary is sufficient for staying in London, if not in South Kensington!)

The post holder will work on developing a novel Bayesian Non-Parametric Test for Conditional Independence. This is at the core of modern causal discovery, itself of paramount importance throughout the sciences and in Machine Learning. As part of this project, the post holder will derive a Bayesian non-parametric testing procedure for conditional independence, scalable to high-dimensional conditioning variable. To ensure maximum impact and allow experimenters in different fields to easily apply this new methodology, the post holder will then create an open-source software package available on the R statistical programming platform. Doing so, the post holder will investigate applying this approach to real-world data from our established partners who have a track record of informing national and international bodies such as Public Health England and the World Health Organisation.

Nonparametric hierarchical Bayesian quantiles

Posted in Books, Statistics, University life with tags , , , , , , , on June 9, 2016 by xi'an

Luke Bornn, Neal Shephard and Reza Solgi have recently arXived a research report on non-parametric Bayesian quantiles. This work relates to their earlier paper that combines Bayesian inference with moment estimators, in that the quantiles do not define entirely the distribution of the data, which then needs to be completed by Bayesian means. But contrary to this previous paper, it does not require MCMC simulation for distributions defined on a variety as, e.g., a curve.

Here a quantile is defined as minimising an asymmetric absolute risk, i.e., an expected loss. It is therefore a deterministic function of the model parameters for a parametric model and a functional of the model otherwise. And connected to a moment if not a moment per se. In the case of a model with a discrete support, the unconstrained model is parameterised by the probability vector θ and β=t(θ). However, the authors study the opposite approach, namely to set a prior on β, p(β), and then complement this prior with a conditional prior on θ, p(θ|β), the joint prior p(β)p(θ|β) being also the marginal p(θ) because of the deterministic relation. However, I am getting slightly lost in the motivation for the derivation of the conditional when the authors pick an arbitrary prior on θ and use it to derive a conditional on β which, along with an arbitrary (“scientific”) prior on β defines a new prior on θ. This works out in the discrete case because β has a finite support. But it is unclear (to me) why it should work in the continuous case [not covered in the paper].

Getting back to the central idea of defining first the distribution on the quantile β, a further motivation is provided in the hierarchical extension of Section 3, where the same quantile distribution is shared by all individuals (e.g., cricket players) in the population, while the underlying distributions for the individuals are otherwise disconnected and unconstrained. (Obviously, a part of the cricket example went far above my head. But one may always idly wonder why all players should share the same distribution. And about what would happen when imposing no quantile constraint but picking instead a direct hierarchical modelling on the θ’s.) This common distribution on β can then be modelled by a Dirichlet hyperprior.

The paper also contains a section on estimating the entire quantile function, which is a wee paradox in that this function is again a deterministic transform of the original parameter θ, but that the authors use instead pointwise estimation, i.e., for each level τ. I find the exercise furthermore paradoxical in that the hierarchical modelling with a common distribution on the quantile β(τ) only is repeated for each τ but separately, while it should be that the entire parameter should share a common distribution. Given the equivalence between the quantile function and the entire parameter θ.

MLSS 2016: machine learning summer school in Cádiz [deadline]

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on March 11, 2016 by xi'an

Following [time-wise] the AISTATS 2016 meeting, a machine learning school is organised in Cádiz (as is the tradition for AISTATS meetings in Europe, i.e., in even years). With an impressive [if downright scary] poster! There is no strong statistics component in the programme, apart from a course by Tamara Broderick on non-parametric Bayes, but the list of speakers is impressive and the ten day school is worth recommending for all interested students.  (I remember giving a short course at MLSS 2004 on Berder Island in Brittany, with the immediate reward of running the Auray-Vannes half-marathon that year…) The deadline for applications is March 25, 2016.

Judith Rousseau gets Bernoulli Society Ethel Newbold Prize

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on July 31, 2015 by xi'an

As announced at the 60th ISI World Meeting in Rio de Janeiro, my friend, co-author, and former PhD student Judith Rousseau got the first Ethel Newbold Prize! Congrats, Judith! And well-deserved! The prize is awarded by the Bernoulli Society on the following basis

The Ethel Newbold Prize is to be awarded biannually to an outstanding statistical scientist for a body of work that represents excellence in research in mathematical statistics, and/or excellence in research that links developments in a substantive field to new advances in statistics. In any year in which the award is due, the prize will not be awarded unless the set of all nominations includes candidates from both genders.

and is funded by Wiley. I support very much this (inclusive) approach of “recognizing the importance of women in statistics”, without creating a prize restricted to women nominees (and hence exclusive).  Thanks to the members of the Program Committee of the Bernoulli Society for setting that prize and to Nancy Reid in particular.

Ethel Newbold was a British statistician who worked during WWI in the Ministry of Munitions and then became a member of the newly created Medical Research Council, working on medical and industrial studies. She was the first woman to receive the Guy Medal in Silver in 1928. Just to stress that much remains to be done towards gender balance, the second and last woman to get a Guy Medal in Silver is Sylvia Richardson, in 2009… (In addition, Valerie Isham, Nicky Best, and Fiona Steele got a Guy Medal in Bronze, out of the 71 so far awarded, while no woman ever got a Guy Medal in Gold.) Funny occurrences of coincidence: Ethel May Newbold was educated at Tunbridge Wells, the place where Bayes was a minister, while Sylvia is now head of the Medical Research Council biostatistics unit in Cambridge.

Approximate reasoning on Bayesian nonparametrics

Posted in Books, Statistics, University life with tags , , on July 7, 2015 by xi'an

[Here is a call for a special issue on Bayesian nonparametrics, edited by Alessio Benavoli , Antonio Lijoi and Antonietta Mira, for an Elsevier journal I had never heard of previously:]

The International Journal of Approximate Reasoning is pleased to announce a special issue on “Bayesian Nonparametrics”. The submission deadline is *December 1st*, 2015.

The aim of this Special Issue is twofold. First, it is to give a broad overview of the most popular models used in BNP and their application in
Artificial Intelligence, by means of tutorial papers. Second, the Special Issue will focus on theoretical advances and challenging applications of BNP with special emphasis on the following aspects:

  • Methodological and theoretical developments of BNP
  • Treatment of imprecision and uncertainty with/in BNP methods
  • Formal applications of BNP methods to novel applied problems
  • New computational and simulation tools for BNP inference.

mixture models with a prior on the number of components

Posted in Books, Statistics, University life with tags , , , , , , , on March 6, 2015 by xi'an


“From a Bayesian perspective, perhaps the most natural approach is to treat the numberof components like any other unknown parameter and put a prior on it.”

Another mixture paper on arXiv! Indeed, Jeffrey Miller and Matthew Harrison recently arXived a paper on estimating the number of components in a mixture model, comparing the parametric with the non-parametric Dirichlet prior approaches. Since priors can be chosen towards agreement between those. This is an obviously interesting issue, as they are often opposed in modelling debates. The above graph shows a crystal clear agreement between finite component mixture modelling and Dirichlet process modelling. The same happens for classification.  However, Dirichlet process priors do not return an estimate of the number of components, which may be considered a drawback if one considers this is an identifiable quantity in a mixture model… But the paper stresses that the number of estimated clusters under the Dirichlet process modelling tends to be larger than the number of components in the finite case. Hence that the Dirichlet process mixture modelling is not consistent in that respect, producing parasite extra clusters…

In the parametric modelling, the authors assume the same scale is used in all Dirichlet priors, that is, for all values of k, the number of components. Which means an incoherence when marginalising from k to (k-p) components. Mild incoherence, in fact, as the parameters of the different models do not have to share the same priors. And, as shown by Proposition 3.3 in the paper, this does not prevent coherence in the marginal distribution of the latent variables. The authors also draw a comparison between the distribution of the partition in the finite mixture case and the Chinese restaurant process associated with the partition in the infinite case. A further analogy is that the finite case allows for a stick breaking representation. A noteworthy difference between both modellings is about the size of the partitions

\mathbb{P}(s_1,\ldots,s_k)\propto\prod_{j=1}^k s_j^{-\gamma}\quad\text{versus}\quad\mathbb{P}(s_1,\ldots,s_k)\propto\prod_{j=1}^k s_j^{-1}

in the finite (homogeneous partitions) and infinite (extreme partitions) cases.

An interesting entry into the connections between “regular” mixture modelling and Dirichlet mixture models. Maybe not ultimately surprising given the past studies by Peter Green and Sylvia Richardson of both approaches (1997 in Series B and 2001 in JASA).

mini Bayesian nonparametrics in Paris

Posted in pictures, Statistics, University life with tags , , , , , on September 10, 2013 by xi'an

Today, I attended a “miniworkshop” on Bayesian nonparametrics in Paris (Université René Descartes, now located in an intensely renovated area near the Grands Moulins de Paris), in connection with one of the ANR research grants that support my research, BANHDITS in the present case. Reflecting incidentally that it was the third Monday in a row that I was at a meeting listening to talks (after Hong Kong and Newcastle)… The talks were as follows

9h30 – 10h15 : Dominique Bontemps/Sébastien Gadat
Bayesian point of view on the Shape Invariant Model
10h15 – 11h : Pierpaolo De Blasi
Posterior consistency of nonparametric location-scale mixtures for multivariate density estimation
11h30 – 12h15 : Jean-Bernard Salomond
General posterior contraction rate Theorem in inverse problems.
12h15 – 13h : Eduard Belitser
On lower bounds for posterior consistency (I)
14h30 – 15h15 : Eduard Belitser
On lower bounds for posterior consistency (II)
15h15 – 16h : Judith Rousseau
Posterior concentration rates for empirical Bayes approaches
16h – 16h45 : Elisabeth Gassiat
Nonparametric HMM models

While most talks were focussing on contraction and consistency rates, hence far from my current interests, both talk by Judith and Elisabeth held more appeal to me. Judith gave conditions for an empirical Bayes nonparametric modelling to be consistent, with examples taken from Peter Green’s mixtures of Dirichlet, and Elisabeth concluded with a very generic result on the consistent estimation of a finite hidden Markov model. (Incidentally, the same BANHDITS grant will also support the satellite meeting on Bayesian non-parametric at MCMSki IV on Jan. 09.)