**An** interesting query on (or from) X validated: given a Bernoulli mixture where the weights are known and the probabilities are jointly drawn from a Dirichlet, what is the most efficient from running a Gibbs sampler including the latent variables to running a basic Metropolis-Hastings algorithm based on the mixture representation to running a collapsed Gibbs sampler that only samples the indicator variables… I provided a closed form expression for the collapsed target, but believe that the most efficient solution is based on the mixture representation!

## Archive for Bernoulli mixture

## Bernoulli mixtures

Posted in pictures, Statistics, University life with tags Bernoulli mixture, cross validated, Gibbs sampler, Helvetia, Jakob Bernoulli, Metropolis-Hastings algorithm, mixtures, stamp on October 30, 2019 by xi'an## relabelling mixtures (#2)

Posted in Statistics, Travel, University life with tags allocations, Bernoulli mixture, finite mixtures, label switching, MCMC algorithms, Monte Carlo Statistical Methods, permutations on February 5, 2015 by xi'an**F**ollowing the previous post, I went and had a (long) look at Puolamäki and Kaski’s paper. I must acknowledge that, despite having several runs through the paper, I still have trouble with the approach… From what I understand, the authors use a Bernoulli mixture pseudo-model to reallocate the observations to components. That is, given an MCMC output with simulated allocations variables (a.k.a., hidden or latent variables), they create a (*T*x*K*)x*n* matrix of component binary indicators e.g., for a three component mixture,

0 1 0 0 1 0…

1 0 0 0 0 0…

0 0 1 1 0 1…

0 1 0 0 1 1…

and estimate a probability to be in component *j* for each of the *n* observations, according to the (pseudo-)likelihood

It took me a few days, between morning runs and those wee hours when I cannot get back to sleep (!), to make some sense of this Bernoulli modelling. The allocation vectors are used *together* to estimate the probabilities of being “in” component j *together*. However the data—which is the outcome of an MCMC simulation and *de facto* does not originate from that Bernoulli mixture—does not seem appropriate, both because it is produced by an MCMC simulation and is made of blocks of highly correlated rows [which sum up to one]. The Bernoulli likelihood above also defines a new model, with many more parameters than in the original mixture model. And I fail to see why perfect, partial or inexistent label switching [in the MCMC sequence] is not going to impact the estimation of the Bernoulli mixture. And why an argument based on a fixed parameter value (Theorem 3) extends to an MCMC outcome where parameters themselves are subjected to some degree of label switching. Bemused, I remain…