## Finite mixture models do not reliably learn the number of components

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on October 15, 2022 by xi'an

When preparing my talk for Padova, I found that Diana Cai, Trevor Campbell, and Tamara Broderick wrote this ICML / PLMR paper last year on the impossible estimation of the number of components in a mixture.

“A natural check on a Bayesian mixture analysis is to establish that the Bayesian posterior on the number of components increasingly concentrates near the truth as the number of data points becomes arbitrarily large.” Cai, Campbell & Broderick (2021)

Which seems to contradict [my formerly-Glaswegian friend] Agostino Nobile  who showed in his thesis that the posterior on the number of components does concentrate at the true number of components, provided the prior contains that number in its support. As well as numerous papers on the consistency of the Bayes factor, including the one against an infinite mixture alternative, as we discussed in our recent paper with Adrien and Judith. And reminded me of the rebuke I got in 2001 from the late David McKay when mentioning that I did not believe in estimating the number of components, both because of the impact of the prior modelling and of the tendency of the data to push for more clusters as the sample size increased. (This was a most lively workshop Mike Titterington and I organised at ICMS in Edinburgh, where Radford Neal also delivered an impromptu talk to argue against using the Galaxy dataset as a benchmark!)

“In principle, the Bayes factor for the MFM versus the DPM could be used as an empirical criterion for choosing between the two models, and in fact, it is quite easy to compute an approximation to the Bayes factor using importance sampling” Miller & Harrison (2018)

This is however a point made in Miller & Harrison (2018) that the estimation of k logically goes south if the data is not from the assumed mixture model. In this paper, Cai et al. demonstrate that the posterior diverges, even when it depends on the sample size. Or even the sample as in empirical Bayes solutions.

## off to Padova??? [for its 800th anniversary]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on September 21, 2022 by xi'an

Posted in Kids, pictures, Running, Travel with tags , , , , , , , on May 12, 2018 by xi'an

## the Hyvärinen score is back

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on November 21, 2017 by xi'an

Stéphane Shao, Pierre Jacob and co-authors from Harvard have just posted on arXiv a new paper on Bayesian model comparison using the Hyvärinen score

$\mathcal{H}(y, p) = 2\Delta_y \log p(y) + ||\nabla_y \log p(y)||^2$

which thus uses the Laplacian as a natural and normalisation-free penalisation for the score test. (Score that I first met in Padova, a few weeks before moving from X to IX.) Which brings a decision-theoretic alternative to the Bayes factor and which delivers a coherent answer when using improper priors. Thus a very appealing proposal in my (biased) opinion! The paper is mostly computational in that it proposes SMC and SMC² solutions to handle the estimation of the Hyvärinen score for models with tractable likelihoods and tractable completed likelihoods, respectively. (Reminding me that Pierre worked on SMC² algorithms quite early during his Ph.D. thesis.)

A most interesting remark in the paper is to recall that the Hyvärinen score associated with a generic model on a series must be the prequential (predictive) version

$\mathcal{H}_T (M) = \sum_{t=1}^T \mathcal{H}(y_t; p_M(dy_t|y_{1:(t-1)}))$

rather than the version on the joint marginal density of the whole series. (Followed by a remark within the remark that the logarithm scoring rule does not make for this distinction. And I had to write down the cascading representation

$\log p(y_{1:T})=\sum_{t=1}^T \log p(y_t|y_{1:t-1})$

to convince myself that this unnatural decomposition, where the posterior on θ varies on each terms, is true!) For consistency reasons.

This prequential decomposition is however a plus in terms of computation when resorting to sequential Monte Carlo. Since each time step produces an evaluation of the associated marginal. In the case of state space models, another decomposition of the authors, based on measurement densities and partial conditional expectations of the latent states allows for another (SMC²) approximation. The paper also establishes that for non-nested models, the Hyvärinen score as a model selection tool asymptotically selects the closest model to the data generating process. For the divergence induced by the score. Even for state-space models, under some technical assumptions.  From this asymptotic perspective, the paper exhibits an example where the Bayes factor and the Hyvärinen factor disagree, even asymptotically in the number of observations, about which mis-specified model to select. And last but not least the authors propose and assess a discrete alternative relying on finite differences instead of derivatives. Which remains a proper scoring rule.

I am quite excited by this work (call me biased!) and I hope it can induce following works as a viable alternative to Bayes factors, if only for being more robust to the [unspecified] impact of the prior tails. As in the above picture where some realisations of the SMC² output and of the sequential decision process see the wrong model being almost acceptable for quite a long while…

## O-Bayes15 [registration & call for papers]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , on January 5, 2015 by xi'an

Both registration and call for papers have now been posted on the webpage of the 11th International Workshop on Objective Bayes Methodology, aka O-Bayes 15, that will take place in Valencia next June 1-5.  The spectrum of the conference is quite wide, as reflected by the range of speakers. In addition, this conference is dedicated to our friend Susie Bayarri, to celebrate her life and contributions to Bayesian Statistics. And in continuation of the morning jog in the memory of George Casella organised by Laura Ventura in Padova, there will be a morning jog for Susie. So register for the meeting and bring your running shoes!