## Finite mixture models do not reliably learn the number of components

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on October 15, 2022 by xi'an

When preparing my talk for Padova, I found that Diana Cai, Trevor Campbell, and Tamara Broderick wrote this ICML / PLMR paper last year on the impossible estimation of the number of components in a mixture.

“A natural check on a Bayesian mixture analysis is to establish that the Bayesian posterior on the number of components increasingly concentrates near the truth as the number of data points becomes arbitrarily large.” Cai, Campbell & Broderick (2021)

Which seems to contradict [my formerly-Glaswegian friend] Agostino Nobile  who showed in his thesis that the posterior on the number of components does concentrate at the true number of components, provided the prior contains that number in its support. As well as numerous papers on the consistency of the Bayes factor, including the one against an infinite mixture alternative, as we discussed in our recent paper with Adrien and Judith. And reminded me of the rebuke I got in 2001 from the late David McKay when mentioning that I did not believe in estimating the number of components, both because of the impact of the prior modelling and of the tendency of the data to push for more clusters as the sample size increased. (This was a most lively workshop Mike Titterington and I organised at ICMS in Edinburgh, where Radford Neal also delivered an impromptu talk to argue against using the Galaxy dataset as a benchmark!)

“In principle, the Bayes factor for the MFM versus the DPM could be used as an empirical criterion for choosing between the two models, and in fact, it is quite easy to compute an approximation to the Bayes factor using importance sampling” Miller & Harrison (2018)

This is however a point made in Miller & Harrison (2018) that the estimation of k logically goes south if the data is not from the assumed mixture model. In this paper, Cai et al. demonstrate that the posterior diverges, even when it depends on the sample size. Or even the sample as in empirical Bayes solutions.

## inferring the number of components [remotely]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , on October 14, 2022 by xi'an

## master project?

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on July 25, 2022 by xi'an

A potential master project for my students next year inspired by an X validated question: given a Gaussian mixture density

$f(x)\propto\sum_{i=1}^m \omega_i \sigma^{-1}\,\exp\{-(x-\mu_i)^2/2\sigma^2\}$

with m known, the weights summing up to one, and the (prior) information that all means are within (-C,C), derive the parameters of this mixture from a sufficiently large number of evaluations of f. Pay attention to the numerical issues associated with the resolution.  In a second stage, envision this problem from an exponential spline fitting perspective and optimise the approach if feasible.

## mixtures of sums vs. sum of mixtures

Posted in Statistics with tags , , , on April 13, 2022 by xi'an

A (mildly) interesting question on X validated last nigh, namely the distribution of a sum of n iid variables distributed from a mixture of exponentials. The rather obvious answer is a mixture of (n+1) distributions, each of them corresponding to a sum of two Gamma variates (but for the extreme cases). But the more interesting component for my personal consumption is that the distribution of this sum of two Gammas with different scales writes up as a signed mixture of Gammas, which comes as an handy (if artificial) illustration for a paper we are completing with Julien Stoehr.

## identifying mixtures

Posted in Books, pictures, Statistics with tags , , , , , , on February 27, 2022 by xi'an

I had not read this 2017 discussion of Bayesian mixture estimation by Michael Betancourt before I found it mentioned in a recent paper. Where he re-explores the issue of identifiability and label switching in finite mixture models. Calling somewhat abusively degenerate mixtures where all components share the same family, e.g., mixtures of Gaussians. Illustrated by Stan code and output. This is rather traditional material, in that the non-identifiability of mixture components has been discussed in many papers and at least as many solutions proposed to overcome the difficulties of exploring the posterior distribution. Including our 2000 JASA paper with Gilles Celeux and Merrilee Hurn. With my favourite approach being the label-free representations as a point process in the parameter space (following an idea of Peter Green) or as a collection of clusters in the latent variable space. I am much less convinced by ordering constraints: while they formally differentiate and therefore identify the individual components of a mixture, they partition the parameter space with no regard towards the geometry of the posterior distribution. With in turn potential consequences on MCMC explorations of this fragmented surface that creates barriers for simulated Markov chains. Plus further difficulties with inferior but attracting modes in identifiable situations.