Archive for mixtures of distributions

an elegant book [review]

Posted in Books, Statistics, University life with tags , , , , , , , on December 28, 2020 by xi'an

“Handbook of Mixture Analysis is an elegant book on the mixture models. It covers not only statistical foundations but also extensions and applications of mixture models. The book consists of 19 chapters (each chapter is an independent paper), and collectively, these chapters weave into an elegant web of mixture models” Yen-Chi Chen (U. Washington)

stratified MCMC

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on December 3, 2020 by xi'an

When working last week with a student, we came across [the slides of a talk at ICERM by Brian van Koten about] a stratified MCMC method whose core idea is to solve a eigenvector equation z’=z’F associated with the masses of “partition” functions Ψ evaluated at the target. (The arXived paper is also available since 2017 but I did not check it in more details.)Although the “partition” functions need to overlap for the matrix not to be diagonal (actually the only case that does not work is when these functions are truly indicator functions). As in other forms of stratified sampling, the practical difficulty is in picking the functions Ψ so that the evaluation of the terms of the matrix F is not overly impacted by the Monte Carlo error. If spending too much time in estimating these terms, there is not a clear gain in switching to stratified sampling, which may be why it is not particularly developed in the MCMC literature….

As an interesting aside, the illustration in this talk comes from the Mexican stamp thickness data I also used in my earlier mixture papers, concerning the 1872 Hidalgo issue that was printed on different qualities of paper. This makes the number k of components somewhat uncertain, although k=3 is sometimes used as a default. Hence a parameter and simulation space of dimension 8, even though the method is used toward approximating the marginal posteriors on the weights λ¹ and λ².

MHC2020

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on October 15, 2019 by xi'an

There is a conference on mixtures (M) and hidden Markov models (H) and clustering (C) taking place in Orsay on June 17-19, next year. Registration is free if compulsory. With about twenty confirmed speakers. (Irrelevant as the following remark is, this is the opportunity to recall the conference on mixtures I organised in Aussois 25 years before! Which website is amazingly still alive at Duke, thanks to Mike West, my co-organiser along with Kathryn Roeder and Gilles Celeux. When checking the abstracts, I found only two presenters common to both conferences, Christophe Biernaki and Jiahua Chen. And alas several names of departed friends.)

from here to infinity

Posted in Books, Statistics, Travel with tags , , , , , , , , , , , , , on September 30, 2019 by xi'an

“Introducing a sparsity prior avoids overfitting the number of clusters not only for finite mixtures, but also (somewhat unexpectedly) for Dirichlet process mixtures which are known to overfit the number of clusters.”

On my way back from Clermont-Ferrand, in an old train that reminded me of my previous ride on that line that took place in… 1975!, I read a fairly interesting paper published in Advances in Data Analysis and Classification by [my Viennese friends] Sylvia Früwirth-Schnatter and Gertrud Malsiner-Walli, where they describe how sparse finite mixtures and Dirichlet process mixtures can achieve similar results when clustering a given dataset. Provided the hyperparameters in both approaches are calibrated accordingly. In both cases these hyperparameters (scale of the Dirichlet process mixture versus scale of the Dirichlet prior on the weights) are endowed with Gamma priors, both depending on the number of components in the finite mixture. Another interesting feature of the paper is to witness how close the related MCMC algorithms are when exploiting the stick-breaking representation of the Dirichlet process mixture. With a resolution of the label switching difficulties via a point process representation and k-mean clustering in the parameter space. [The title of the paper is inspired from Ian Stewart’s book.]

the most probable cluster

Posted in Books, Statistics with tags , , , , , , on July 11, 2019 by xi'an

In the last issue of Bayesian Analysis, Lukasz Rajkowski studies the most likely (MAP) cluster associated with the Dirichlet process mixture model. Reminding me that most Bayesian estimates of the number of clusters are not consistent (when the sample size grows to infinity). I am always puzzled by this problem, as estimating the number of clusters sounds like an ill-posed problem, since it is growing with the number of observations, by definition of the Dirichlet process. For instance, the current paper establishes that the number of clusters intersecting a given compact set remains bounded. (The setup is one of a Normal Dirichlet process mixture with constant and known covariance matrix.)

Since the posterior probability of a given partition of {1,2,…,n} can be (formally) computed, the MAP estimate can be (formally) derived. I inserted formally in the previous sentence as the derivation of the exact MAP is an NP hard problem in the number n of observations. As an aside, I have trouble with the author’s argument that the convex hulls of the clusters should be disjoin: I do not see why they should when the mixture components are overlapping. (More generally, I fail to relate to notions like “bad clusters” or “overestimation of the number of clusters” or a “sensible choice” of the covariance matrix.) More globally, I am somewhat perplexed by the purpose of the paper and the relevance of the MAP estimate, even putting aside my generic criticisms of the MAP approach. No uncertainty is attached to the estimator, which thus appears as a form of penalised likelihood strategy rather than a genuinely Bayesian (Analysis) solution.

The first example in the paper is using data from a Uniform over (-1,1), concluding at a “misleading” partition by the MAP since it produces more than one cluster. I find this statement flabbergasting as the generative model is not the estimated model. To wit, the case of an exponential Exp(1) sample that cannot reach a maximum of the target function with a finite number of sample. Which brings me back full-circle to my general unease about clustering in that much more seems to be assumed about this notion than what the statistical model delivers.