## Finite mixture models do not reliably learn the number of components

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on October 15, 2022 by xi'an

When preparing my talk for Padova, I found that Diana Cai, Trevor Campbell, and Tamara Broderick wrote this ICML / PLMR paper last year on the impossible estimation of the number of components in a mixture.

“A natural check on a Bayesian mixture analysis is to establish that the Bayesian posterior on the number of components increasingly concentrates near the truth as the number of data points becomes arbitrarily large.” Cai, Campbell & Broderick (2021)

Which seems to contradict [my formerly-Glaswegian friend] Agostino Nobile  who showed in his thesis that the posterior on the number of components does concentrate at the true number of components, provided the prior contains that number in its support. As well as numerous papers on the consistency of the Bayes factor, including the one against an infinite mixture alternative, as we discussed in our recent paper with Adrien and Judith. And reminded me of the rebuke I got in 2001 from the late David McKay when mentioning that I did not believe in estimating the number of components, both because of the impact of the prior modelling and of the tendency of the data to push for more clusters as the sample size increased. (This was a most lively workshop Mike Titterington and I organised at ICMS in Edinburgh, where Radford Neal also delivered an impromptu talk to argue against using the Galaxy dataset as a benchmark!)

“In principle, the Bayes factor for the MFM versus the DPM could be used as an empirical criterion for choosing between the two models, and in fact, it is quite easy to compute an approximation to the Bayes factor using importance sampling” Miller & Harrison (2018)

This is however a point made in Miller & Harrison (2018) that the estimation of k logically goes south if the data is not from the assumed mixture model. In this paper, Cai et al. demonstrate that the posterior diverges, even when it depends on the sample size. Or even the sample as in empirical Bayes solutions.

## 21w5107 [½day 3]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on December 2, 2021 by xi'an

Day [or half-day] three started without firecrackers and with David Rossell (formerly Warwick) presenting an empirical Bayes approach to generalised linear model choice with a high degree of confounding, using approximate Laplace approximations. With considerable improvements in the experimental RMSE. Making feeling sorry there was no apparent fully (and objective?) Bayesian alternative! (Two more papers on my reading list that I should have read way earlier!) Then Veronika Rockova discussed her work on approximate Metropolis-Hastings by classification. (With only a slight overlap with her One World ABC seminar.) Making me once more think of Geyer’s n⁰564 technical report, namely the estimation of a marginal likelihood by a logistic discrimination representation. Her ABC resolution replaces the tolerance step by an exponential of minus the estimated Kullback-Leibler divergence between the data density and the density associated with the current value of the parameter. (I wonder if there is a residual multiplicative constant there… Presumably not. Great idea!) The classification step need be run at every iteration, which could be sped up by subsampling.

On the always fascinating theme of loss based posteriors, à la Bissiri et al., Jack Jewson (formerly Warwick) exposed his work generalised Bayesian and improper models (from Birmingham!). Using data to decide between model and loss, which sounds highly unorthodox! First difficulty is that losses are unscaled. Or even not integrable after an exponential transform. Hence the notion of improper models. As in the case of robust Tukey’s loss, which is bounded by an arbitrary κ. Immediately I wonder if the fact that the pseudo-likelihood does not integrate is important beyond the (obvious) absence of a normalising constant. And the fact that this is not a generative model. And the answer came a few slides later with the use of the Hyvärinen score. Rather than the likelihood score. Which can itself be turned into a H-posterior, very cool indeed! Although I wonder at the feasibility of finding an [objective] prior on κ.

Rajesh Ranganath completed the morning session with a talk on [the difficulty of] connecting Bayesian models and complex prediction models. Using instead a game theoretic approach with Brier scores under censoring. While there was a connection with Veronika’s use of a discriminator as a likelihood approximation, I had trouble catching the overall message…

## empirically Bayesian [wISBApedia]

Posted in Statistics with tags , , , , , , , on August 9, 2021 by xi'an

Last week I was pointed out a puzzling entry in the “empirical Bayes” Wikipedia page. The introduction section indeed contains a description of an iterative simulation method that involves an hyperprior $$p(η)$$even though the empirical Bayes perspective does not involve an hyperprior.

While the entry is vague and lacks formulae

These suggest an iterative scheme, qualitatively similar in structure to a Gibbs sampler, to evolve successively improved approximations to $$p(θ$$$$∣$$$$y)$$ and $$p(η∣y)$$. First, calculate an initial approximation to $$p(θ∣y)$$ ignoring the $$η$$ dependence completely; then calculate an approximation to $$p(η$$$$|$$$$y)$$ based upon the initial approximate distribution of $$p(θ$$$$∣$$$$y)$$; then use this $$p(η$$$$∣$$$$y)$$ to update the approximation for $$p(θ$$$$∣$$$$y)$$; then update $$p(η$$$$∣$$$$y)$$; and so on.

it sounds essentially equivalent to a Gibbs sampler, possibly a multiple try Gibbs sampler (unless the author had another notion in mind, alas impossible to guess since no reference is included).

Beyond this specific case, where I think the entire paragraph should be erased from the “empirical Bayes” Wikipedia page, I discussed the general problem of some poor Bayesian entries in Wikipedia with Robin Ryder, who came with the neat idea of running (collective) Wikipedia editing labs at ISBA conferences. If we could further give an ISBA label to these entries, as a certificate of “Bayesian orthodoxy” (!), it would be terrific!

## bootstrap in Nature

Posted in Statistics with tags , , , , , , , , , , on December 29, 2018 by xi'an

A news item in the latest issue of Nature I received about Brad Efron winning the “Nobel Prize of Statistics” this year. The bootstrap is certainly an invention worth the recognition, not to mention Efron’s contribution to empirical Bayes analysis,, even though I remain overall reserved about the very notion of a Nobel prize in any field… With an appropriate XXL quote, who called the bootstrap method the ‘best statistical pain reliever ever produced’!

## a Bayesian interpretation of FDRs?

Posted in Statistics with tags , , , , , , , , , , on April 12, 2018 by xi'an

This week, I happened to re-read John Storey’ 2003 “The positive discovery rate: a Bayesian interpretation and the q-value”, because I wanted to check a connection with our testing by mixture [still in limbo] paper. I however failed to find what I was looking for because I could not find any Bayesian flavour in the paper apart from an FRD expressed as a “posterior probability” of the null, in the sense that the setting was one of opposing two simple hypotheses. When there is an unknown parameter common to the multiple hypotheses being tested, a prior distribution on the parameter makes these multiple hypotheses connected. What makes the connection puzzling is the assumption that the observed statistics defining the significance region are independent (Theorem 1). And it seems to depend on the choice of the significance region, which should be induced by the Bayesian modelling, not the opposite. (This alternative explanation does not help either, maybe because it is on baseball… Or maybe because the sentence “If a player’s [posterior mean] is above .3, it’s more likely than not that their true average is as well” does not seem to appear naturally from a Bayesian formulation.) [Disclaimer: I am not hinting at anything wrong or objectionable in Storey’s paper, just being puzzled by the Bayesian tag!]