Archive for finite mixtures

mixtures of sums vs. sum of mixtures

Posted in Statistics with tags , , , on April 13, 2022 by xi'an

A (mildly) interesting question on X validated last nigh, namely the distribution of a sum of n iid variables distributed from a mixture of exponentials. The rather obvious answer is a mixture of (n+1) distributions, each of them corresponding to a sum of two Gamma variates (but for the extreme cases). But the more interesting component for my personal consumption is that the distribution of this sum of two Gammas with different scales writes up as a signed mixture of Gammas, which comes as an handy (if artificial) illustration for a paper we are completing with Julien Stoehr.

identifying mixtures

Posted in Books, pictures, Statistics with tags , , , , , , on February 27, 2022 by xi'an

I had not read this 2017 discussion of Bayesian mixture estimation by Michael Betancourt before I found it mentioned in a recent paper. Where he re-explores the issue of identifiability and label switching in finite mixture models. Calling somewhat abusively degenerate mixtures where all components share the same family, e.g., mixtures of Gaussians. Illustrated by Stan code and output. This is rather traditional material, in that the non-identifiability of mixture components has been discussed in many papers and at least as many solutions proposed to overcome the difficulties of exploring the posterior distribution. Including our 2000 JASA paper with Gilles Celeux and Merrilee Hurn. With my favourite approach being the label-free representations as a point process in the parameter space (following an idea of Peter Green) or as a collection of clusters in the latent variable space. I am much less convinced by ordering constraints: while they formally differentiate and therefore identify the individual components of a mixture, they partition the parameter space with no regard towards the geometry of the posterior distribution. With in turn potential consequences on MCMC explorations of this fragmented surface that creates barriers for simulated Markov chains. Plus further difficulties with inferior but attracting modes in identifiable situations.

the many nuances of Bayesian testing [CERminar]

Posted in Statistics with tags , , , , , , , , , , , on January 19, 2022 by xi'an

CERminar

ISBA 2021.1

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , on June 29, 2021 by xi'an

An infinite (mixture) session was truly the first one I could attend on Day 1, as a heap of unexpected last minute issues kept me busy or on hedge for the beginning of the day (if not preventing me from a dawn dip in Calanque de Morgiou). Using the CIRM video system for zoom talked required more preparation than I had thought and we made it barely in time for the first session, while I had to store zoom links for all speakers present in Luminy.  Plus allocate sessions to the rooms provided by CIRM, twice since there was a mishap with the other workshop present at CIRM. And reassuring speakers, made anxious by the absence of a clear schedule. Chairing the second ABC session was also a tense moment, from checking every speaker could connect and share slides, to ensuring they kept on schedule (and they did on both!, ta’), to checking for questions at the end. Spotting a possible connection between Takuo Mastubara’s Stein’s approximation for in the ABC setup and a related paper by Liu and Lee I had read just a few days ago. Alas, it was too early to relax as an inverter in the CIRM room burned and led to a local power failure. Fortunately this was restored prior to the mixture session! (As several boars were spotted on the campus yesternight, I hope no tragic encounter happens before the end of the meeting!!!) So the mixture session proposed new visions on infering K, the number of components, some of which reminded me of… my first talk at CIRM where I was trying to get rid of empty components at each MCMC step, albeit in a much more rudimentary way obviously. And later had the wonderful surprise of hearing Xiao-Li’s lecture start by an excerpt from Car Talk, the hilarious Sunday morning radio talk-show about the art of used car maintenance on National Public Radio (NPR) that George Casella could not miss (and where a letter he wrote them about a mistaken probability computation was mentioned!). The final session of the day was an invited ABC session I chaired (after being exfiltrated from the CIRM dinner table!) with Kate Lee, Ryan Giordano, and Julien Stoehr as speakers. Besides Julien’s talk on our Gibbs-ABC paper, both other talks shared a concern with the frequentist properties of the ABC posterior, either to be used as a control tool or as a faster assessment of the variability of the (Monte Carlo) ABC output.

mathematical theory of Bayesian statistics [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on May 6, 2021 by xi'an

I came by chance (and not by CHANCE) upon this 2018 CRC Press book by Sumio Watanabe and ordered it myself to gather which material it really covered. As the back-cover blurb was not particularly clear and the title sounded quite general. After reading it, I found out that this is a mathematical treatise on some aspects of Bayesian information criteria, in particular on the Widely Applicable Information Criterion (WAIC) that was introduced by the author in 2010. The result is a rather technical and highly focussed book with little motivation or intuition surrounding the mathematical results, which may make the reading arduous for readers. Some background on mathematical statistics and Bayesian inference is clearly preferable and the book cannot be used as a textbook for most audiences, as opposed to eg An Introduction to Bayesian Analysis by J.K. Ghosh et al. or even more to Principles of Uncertainty by J. Kadane. In connection with this remark the exercises found in the book are closer to the delivery of additional material than to textbook-style exercises.

“posterior distributions are often far from any normal distribution, showing that Bayesian estimation gives the more accurate inference than other estimation methods.”

The overall setting is one where both the sampling and the prior distributions are different from respective “true” distributions. Requiring a tool to assess the discrepancy when utilising a specific pair of such distributions. Especially when the posterior distribution cannot be approximated by a Normal distribution. (Lindley’s paradox makes an interesting incognito incursion on p.238.) The WAIC is supported for the determination of the “true” model, in opposition to AIC and DIC, incl. on a mixture example that reminded me of our eight versions of DIC paper. In the “Basic Bayesian Theory” chapter (§3), the “basic theorem of Bayesian statistics” (p.85) states that the various losses related with WAIC can be expressed as second-order Taylor expansions of some cumulant generating functions, with order o(n⁻¹), “even if the posterior distribution cannot be approximated by any normal distribution” (p.87). With the intuition that

“if a log density ratio function has a relatively finite variance then the generalization loss, the cross validation loss, the training loss and WAIC have the same asymptotic behaviors.”

Obviously, these “basic” aspects should come as a surprise to a fair percentage of Bayesians (in the sense of not being particularly basic). Myself included. Chapter 4 exposes why, for regular models, the posterior distribution accumulates in an ε neighbourhood of the optimal parameter at a speed O(n2/5). With the normalised partition function being of order n-d/2 in the neighbourhood and exponentially negligible outside. A consequence of this regular asymptotic theory is that all above losses are asymptotically equivalent to the negative log likelihood plus similar order n⁻¹ terms that can be ordered. Chapters 5 and 6 deal with “standard” [the likelihood ratio is a multi-index power of the parameter ω] and general posterior distributions that can be written as mixtures of standard distributions,  with expressions of the above losses in terms of new universal constants. Again, a rather remote concern of mine. The book also includes a chapter (§7) on MCMC, with a rather involved proof that a Metropolis algorithm satisfies detailed balance (p.210). The Gibbs sampling section contains an extensive example on a two-dimensional two-component unit-variance Normal mixture, with an unusual perspective on the posterior, which is considered as “singular” when the true means are close. (Label switching or the absence thereof is not mentioned.) In terms of approximating the normalising constant (or free energy), the only method discussed there is path sampling, with a cryptic remark about harmonic mean estimators (not identified as such). In a final knapsack chapter (§9),  Bayes factors (confusedly denoted as L(x)) are shown to be most powerful tests in a Bayesian sense when comparing hypotheses without prior weights on said hypotheses, while posterior probability ratios are the natural statistics for comparing models with prior weights on said models. (With Lindley’s paradox making another appearance, still incognito!) And a  notion of phase transition for hyperparameters is introduced, with the meaning of a radical change of behaviour at a critical value of said hyperparameter. For instance, for a simple normal- mixture outlier model, the critical value of the Beta hyperparameter is α=2. Which is a wee bit of a surprise when considering Rousseau and Mengersen (2011) since their bound for consistency was α=d/2.

In conclusion, this is quite an original perspective on Bayesian models, covering the somewhat unusual (and potentially controversial) issue of misspecified priors and centered on the use of information criteria. I find the book could have benefited from further editing as I noticed many typos and somewhat unusual sentences (at least unusual to me).

[Disclaimer about potential self-plagiarism: this post or an edited version should eventually appear in my Books Review section in CHANCE.]

%d bloggers like this: