Archive for mixtures of distributions

simulating signed mixtures

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , on February 2, 2024 by xi'an

While simulating from a mixture of standard densities is relatively straightforward, when the component densities are easily simulated, to the point that many simulation methods exploit an intermediary mixture construction to speed up the production of pseudo-random samples from more challenging distributions (see Devroye, 1986), things get surprisingly more complicated when the mixture weights can take negative values. For instance, the naïve solution consisting in first simulating from the associated mixture of positive weight components
and then using an accept-reject step may prove highly inefficient since the overall probability of acceptance

{\displaystyle 1}\Big/{\displaystyle \sum_{k=1}^{P} \omega_k^+}

is the inverse of the sum of the positive weights and hence can be arbitrarily close to zero. The intuition for such inefficiency is that simulating from the positive weight components need not produce values within regions of high probability for the actual distribution

m = \sum_{k=1}^P \omega_k^+ f_k - \sum_{k=1}^N \omega_k^- g_k

since its negative weight components may remove most of the mass under the positive weight components. In other words, the negative weight components do not have a natural latent variable interpretation and the resulting mixture can be anything, as the above graph testifies.

Julien Stoehr (Paris Dauphine) and I started investigating this interesting challenge when the Master students who had been exposed to said challenge could not dent it in any meaningful way. We have now arXived a specific algorithm that proves superior to the naïve accept-reject algorithm, but also to the numerical cdf inversion (which happens to be available in this setting). Compared with the naïve version, we construct an alternative accept-reject scheme based on pairing positive and negative components as well as possible, partitioning the real line, and finding tighter upper and lower bounds on positive and negative components, respectively, towards yielding a higher acceptance rate on average. Designing a random generator of signed mixtures with enough variability and representativity proved a challenge in itself!

Approximation Methods in Bayesian Analysis [#2]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on June 22, 2023 by xi'an

A more theoretical Day #2 of the workshop, with Debdeep Pati comparing two representations of Gaussian processes with significantly different efficiencies, and Aad van der Vaart presenting a form of linearisation for a range of inverse problems, Kolyan Ray debiasing Lasso impacts by variational Bayes, although through a somewhat intricate process that distanced the procedure from Bayesian grounds imho, Judith Rousseau (Dauphine) also drifting away from Bayesian canons by looking anew at empirical Bayes with surprising differences from genuine B analysis, connecting with the cutoff phenomenon she and Kerrie exhibited in their 2011 mixture paper, as well as labelling the marginal likelihood a misspecified model. Trevor Campbell and Sinead Williamson both provided Bayesian perspectives on normalising flows, in particular the impact of computer imprecision on reversibility, leading to the notion of shadow paths (screenshot below), while Giovanni Rebaudo talked about mixtures supported by trees, a fascinating object!

On Day #3, Marc Beaumont talked on a mixture of composite likelihood à la Ryden, making me wonder of optimisation of blocks for HMC? EP-ABC, with the issue of the unknown amount of approximation, and adaptivity?, Maria de Iorio presented work on finite and infinite mixtures with repulsive (Coulomb) priors, achieving a unified framework, plus known evidence (?), with a correlated talk by Federico Camerlenghi in the afternoon, with novel notions (for me) of Palm measures and calculus, and another correlated talk by María-Fernanda Gil Leyva Villa, on stick-breaking processes for species sampling with dependent length variables, with related improvements in Gibbs implementation (screenshot below).
This was followed by two theoretical talks on continuous time processes by Paul Jenkins (Warwick) on the fine properties of the Flemming-Viot process, with mentions of Don Dawson’s results reminding me of the 1988 and 1989 summers I spent at Carleton University, where he was located at the time, and Matteo Ruggieri, with the novel (to me) notion of dual Markov processes that could prove useful in a lot of latent variable models. Fabrizio Leisen expanded on his early work on partial exchangeability and Steve MacEachern on dependent quantile pyramids, which relate to quantile regression, a constant source of puzzlement for me. Motivating the perspective by robustness and misspecification arguments. But I am a wee bit puzzled by the distinction between quantile pyramids and other non-parametric solutions.

On the outdoor front (in early mornings), choppy waters at sea (in the Sugiton calanque, pictured above) thanks to the endless mistral wind, nice run down from Mont Puget with friends, limited utility of my rented mountain bike (except to reach the nearest supermarket, 3km away)

astrostat webinar [IAU-IAA]

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , , , on June 14, 2023 by xi'an

Yesterday, I gavea talk on inferring the number of components in a mixture at the international online IAU-IAA Astrostats and Astroinfo seminar. Which generated (uniformly) interesting and relevant questions for astronomical challenges. As pointed out by my Cornell friend Tom Loredo, it is unfortunately clashing with the ISI quadrenial Statistical Challenges in Modern Astronomy meeting help at Penn State.

van Dantzig seminar

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on June 3, 2023 by xi'an

Bayesian learning

Posted in Statistics with tags , , , , , , , , on May 4, 2023 by xi'an

“…many well-known learning-algorithms, such as those used in optimization, deep learning, and machine learning in general, can now be derived directly following the above scheme using a single algorithm”

The One World ABC webinar today was delivered by Emtiyaz Khan (RIKEN), about the Bayesian Learning Rule, following Khan and Rue 2021 arXival on Bayesian learning. (It had a great intro featuring a video of the speaker’s daughter learning about the purpose of a ukulele in her first year!) The paper argues about a Bayesian interpretation/version of gradient descent algorithms, starting with Zellner’s (1988, the year I first met him!) identity that the posterior is solution to

\min_q \mathbb E_q[\ell(\theta,x)] + KL(q||\pi)

when ℓ is the likelihood and π the prior. This identity can be generalised to an arbitrary loss function (also dependent on the data)  replacing the likelihood and considered for a posterior chosen within an exponential family just as variational Bayes. Ending up with a posterior adapted to this target (in the KL sense). The optimal hyperparameter or pseudo-hyperparameter of this approximation can be recovered by some gradient algorithm, recovering as well stochastic gradient and Newton’s methods. While constructing a prior out of a loss function would have pleased the late Herman Rubin, this is not the case, but rater an approach to deriving a generalised Bayes distribution within a parametric family, including mixtures of Gaussians. At some point in the talk, the uncertainty endemic to the Bayesian approach seeped back into the picture, but since most of the intuition came from machine learning, I was somewhat lost at the nature of this uncertainty.