Archive for clustering

ravencry [book review]

Posted in Books, Kids, Travel with tags , , , , , , , , on November 2, 2019 by xi'an

After enjoying Ed McDonald’s Blackwing this summer, I ordered the second volume, Ravencry, which I read in a couple of days between Warwick and Edinburgh.

“Valya had marked all of the impact sites, then numbered them according to the night they had struck. The first night was more widely distributed, the second slightly more clustered. As the nights passed, the clusters drew together with fewer and fewer outliers.”

Since this is a sequel, the fantasy universe in which the story takes place has not changed much, but gains in consistence and depth. Especially the wastelands created by the wizard controlling the central character. The characters are mostly the same, with the same limited ethics for the surviving ones!, albeit with unexpected twists (no spoiler!), with the perils of a second volume, namely the sudden occurrence of a completely new and obviously deadly threat to the entire world, mostly avoided by connecting quite closely with the first volume. Even the arch-exploited theme of a new religious cult fits rather nicely the new plot. Despite of the urgency of the menace (as usual) to their world, the core characters do not do much in the first part of the book, engaged in a kind of detective work that is rather unusual for fantasy books, but the second part sees a lot of both action and explanation, which is why it became a page-turner for me. And while there are much less allusions to magical mathematics in this volume, a John Snow moment occurs near the above quote.

MHC2020

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on October 15, 2019 by xi'an

There is a conference on mixtures (M) and hidden Markov models (H) and clustering (C) taking place in Orsay on June 17-19, next year. Registration is free if compulsory. With about twenty confirmed speakers. (Irrelevant as the following remark is, this is the opportunity to recall the conference on mixtures I organised in Aussois 25 years before! Which website is amazingly still alive at Duke, thanks to Mike West, my co-organiser along with Kathryn Roeder and Gilles Celeux. When checking the abstracts, I found only two presenters common to both conferences, Christophe Biernaki and Jiahua Chen. And alas several names of departed friends.)

from here to infinity

Posted in Books, Statistics, Travel with tags , , , , , , , , , , , , , on September 30, 2019 by xi'an

“Introducing a sparsity prior avoids overfitting the number of clusters not only for finite mixtures, but also (somewhat unexpectedly) for Dirichlet process mixtures which are known to overfit the number of clusters.”

On my way back from Clermont-Ferrand, in an old train that reminded me of my previous ride on that line that took place in… 1975!, I read a fairly interesting paper published in Advances in Data Analysis and Classification by [my Viennese friends] Sylvia Früwirth-Schnatter and Gertrud Malsiner-Walli, where they describe how sparse finite mixtures and Dirichlet process mixtures can achieve similar results when clustering a given dataset. Provided the hyperparameters in both approaches are calibrated accordingly. In both cases these hyperparameters (scale of the Dirichlet process mixture versus scale of the Dirichlet prior on the weights) are endowed with Gamma priors, both depending on the number of components in the finite mixture. Another interesting feature of the paper is to witness how close the related MCMC algorithms are when exploiting the stick-breaking representation of the Dirichlet process mixture. With a resolution of the label switching difficulties via a point process representation and k-mean clustering in the parameter space. [The title of the paper is inspired from Ian Stewart’s book.]

latent nested nonparametric priors

Posted in Books, Statistics with tags , , , , , , , on September 23, 2019 by xi'an

A paper on an extended type of non-parametric priors by Camerlenghi et al. [all good friends!] is about to appear in Bayesian Analysis, with a discussion open for contributions (until October 15). While a fairly theoretical piece of work, it validates a Bayesian approach for non-parametric clustering of separate populations with, broadly speaking, common clusters. More formally, it constructs a new family of models that allows for a partial or complete equality between two probability measures, but does not force full identity when the associated samples do share some common observations. Indeed, the more traditional structures prohibit one or the other, from the Dirichlet process (DP) prohibiting two probability measure realisations from being equal or partly equal to some hierarchical DP (HDP) already allowing for common atoms across measure realisations, but prohibiting complete identity between two realised distributions, to nested DP offering one extra level of randomness, but with an infinity of DP realisations that prohibits common atomic support besides completely identical support (and hence distribution).

The current paper imagines two realisations of random measures written as a sum of a common random measure and of one of two separate almost independent random measures: (14) is the core formula of the paper that allows for partial or total equality. An extension to a setting larger than facing two samples seems complicated if only because of the number of common measures one has to introduce, from the totally common measure to measures that are only shared by a subset of the samples. Except in the simplified framework when a single and universally common measure is adopted (with enough justification). The randomness of the model is handled via different completely random measures that involved something like four degrees of hierarchy in the Bayesian model.

Since the example is somewhat central to the paper, the case of one or rather two two-component Normal mixtures with a common component (but with different mixture weights) is handled by the approach, although it seems that it was already covered by HDP. Having exactly the same term (i.e., with the very same weight) is not, but this may be less interesting in real life applications. Note that alternative & easily constructed & parametric constructs are already available in this specific case, involving a limited prior input and a lighter computational burden, although the  Gibbs sampler behind the model proves extremely simple on the paper. (One may wonder at the robustness of the sampler once the case of identical distributions is visited.)

Due to the combinatoric explosion associated with a higher number of observed samples, despite obvious practical situations,  one may wonder at any feasible (and possibly sequential) extension, that would further keep a coherence under marginalisation (in the number of samples). And also whether or not multiple testing could be coherently envisioned in this setting, for instance when handling all hospitals in the UK. Another consistency question covers the Bayes factor used to assess whether the two distributions behind the samples are or not identical. (One may wonder at the importance of the question, hopefully applied to more relevant dataset than the Iris data!)

the most probable cluster

Posted in Books, Statistics with tags , , , , , , on July 11, 2019 by xi'an

In the last issue of Bayesian Analysis, Lukasz Rajkowski studies the most likely (MAP) cluster associated with the Dirichlet process mixture model. Reminding me that most Bayesian estimates of the number of clusters are not consistent (when the sample size grows to infinity). I am always puzzled by this problem, as estimating the number of clusters sounds like an ill-posed problem, since it is growing with the number of observations, by definition of the Dirichlet process. For instance, the current paper establishes that the number of clusters intersecting a given compact set remains bounded. (The setup is one of a Normal Dirichlet process mixture with constant and known covariance matrix.)

Since the posterior probability of a given partition of {1,2,…,n} can be (formally) computed, the MAP estimate can be (formally) derived. I inserted formally in the previous sentence as the derivation of the exact MAP is an NP hard problem in the number n of observations. As an aside, I have trouble with the author’s argument that the convex hulls of the clusters should be disjoin: I do not see why they should when the mixture components are overlapping. (More generally, I fail to relate to notions like “bad clusters” or “overestimation of the number of clusters” or a “sensible choice” of the covariance matrix.) More globally, I am somewhat perplexed by the purpose of the paper and the relevance of the MAP estimate, even putting aside my generic criticisms of the MAP approach. No uncertainty is attached to the estimator, which thus appears as a form of penalised likelihood strategy rather than a genuinely Bayesian (Analysis) solution.

The first example in the paper is using data from a Uniform over (-1,1), concluding at a “misleading” partition by the MAP since it produces more than one cluster. I find this statement flabbergasting as the generative model is not the estimated model. To wit, the case of an exponential Exp(1) sample that cannot reach a maximum of the target function with a finite number of sample. Which brings me back full-circle to my general unease about clustering in that much more seems to be assumed about this notion than what the statistical model delivers.

O’Bayes 19/1 [snapshots]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , on June 30, 2019 by xi'an

Although the tutorials of O’Bayes 2019 of yesterday were poorly attended, albeit them being great entries into objective Bayesian model choice, recent advances in MCMC methodology, and the multiple layers of BART, for which I have to blame myself for sticking the beginning of O’Bayes too closely to the end of BNP as only the most dedicated could achieve the commuting from Oxford to Coventry to reach Warwick in time, the first day of talks were well attended, despite weekend commitments, conference fatigue, and perfect summer weather! Here are some snapshots from my bench (and apologies for not covering better the more theoretical talks I had trouble to follow, due to an early and intense morning swimming lesson! Like Steve Walker’s utility based derivation of priors that generalise maximum entropy priors. But being entirely independent from the model does not sound to me like such a desirable feature… And Natalia Bochkina’s Bernstein-von Mises theorem for a location scale semi-parametric model, including a clever construct of a mixture of two Dirichlet priors to achieve proper convergence.)

Jim Berger started the day with a talk on imprecise probabilities, involving the society for imprecise probability, which I discovered while reading Keynes’ book, with a neat resolution of the Jeffreys-Lindley paradox, when re-expressing the null as an imprecise null, with the posterior of the null no longer converging to one, with a limit depending on the prior modelling, if involving a prior on the bias as well, with Chris discussing the talk and mentioning a recent work with Edwin Fong on reinterpreting marginal likelihood as exhaustive X validation, summing over all possible subsets of the data [using log marginal predictive].Håvard Rue did a follow-up talk from his Valencià O’Bayes 2015 talk on PC-priors. With a pretty hilarious introduction on his difficulties with constructing priors and counseling students about their Bayesian modelling. With a list of principles and desiderata to define a reference prior. However, I somewhat disagree with his argument that the Kullback-Leibler distance from the simpler (base) model cannot be scaled, as it is essentially a log-likelihood. And it feels like multivariate parameters need some sort of separability to define distance(s) to the base model since the distance somewhat summarises the whole departure from the simpler model. (Håvard also joined my achievement of putting an ostrich in a slide!) In his discussion, Robin Ryder made a very pragmatic recap on the difficulties with constructing priors. And pointing out a natural link with ABC (which brings us back to Don Rubin’s motivation for introducing the algorithm as a formal thought experiment).

Sara Wade gave the final talk on the day about her work on Bayesian cluster analysis. Which discussion in Bayesian Analysis I alas missed. Cluster estimation, as mentioned frequently on this blog, is a rather frustrating challenge despite the simple formulation of the problem. (And I will not mention Larry’s tequila analogy!) The current approach is based on loss functions directly addressing the clustering aspect, integrating out the parameters. Which produces the interesting notion of neighbourhoods of partitions and hence credible balls in the space of partitions. It still remains unclear to me that cluster estimation is at all achievable, since the partition space explodes with the sample size and hence makes the most probable cluster more and more unlikely in that space. Somewhat paradoxically, the paper concludes that estimating the cluster produces a more reliable estimator on the number of clusters than looking at the marginal distribution on this number. In her discussion, Clara Grazian also pointed the ambivalent use of clustering, where the intended meaning somehow diverges from the meaning induced by the mixture model.

a book and two chapters on mixtures

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 8, 2019 by xi'an

The Handbook of Mixture Analysis is now out! After a few years of planning, contacts, meetings, discussions about notations, interactions with authors, further interactions with late authors, repeating editing towards homogenisation, and a final professional edit last summer, this collection of nineteen chapters involved thirty-five contributors. I am grateful to all participants to this piece of work, especially to Sylvia Früwirth-Schnatter for being a driving force in the project and for achieving a much higher degree of homogeneity in the book than I expected. I would also like to thank Rob Calver and Lara Spieker of CRC Press for their boundless patience through the many missed deadlines and their overall support.

Two chapters which I co-authored are now available as arXived documents:

5. Gilles Celeux, Kaniav Kamary, Gertraud Malsiner-Walli, Jean-Michel Marin, and Christian P. Robert, Computational Solutions for Bayesian Inference in Mixture Models
7. Gilles Celeux, Sylvia Früwirth-Schnatter, and Christian P. Robert, Model Selection for Mixture Models – Perspectives and Strategies

along other chapters

1. Peter Green, Introduction to Finite Mixtures
8. Bettina Grün, Model-based Clustering
12. Isobel Claire Gormley and Sylvia Früwirth-Schnatter, Mixtures of Experts Models
13. Sylvia Kaufmann, Hidden Markov Models in Time Series, with Applications in Economics
14. Elisabeth Gassiat, Mixtures of Nonparametric Components and Hidden Markov Models
19. Michael A. Kuhn and Eric D. Feigelson, Applications in Astronomy