Thanks for the question: I am afraid I cannot debug your R code, which seems to be missing an update on the component parameters, but you should take a look at our book Introducing Monte Carlo Methods with R as it covers the mixture case.

]]>I’m a newbie in Gaussian mixture model.

Could you please advice me any fundamental notes / preferably sample notes in R to implement the latent class in Gaussian mixture.

This is my snippet code to do the finite gaussian mixture:

myData.obs= faithful[,1:2]; # the data matrix

no<-nrows(myData.obs)

prob1 = pi1*dmvnorm(myData.obs,m1,Sigma1); # probabilities of sample points under model 1

prob2 = pi2*dmvnorm(myData.obs,m2,Sigma2); # same for model 2

Z<-rbinom(no,1,prob1/(prob1 + prob2 ))

pi11/2) {

pi1<-1-pi1

Z<-1-Z

}

I want to generate the Z values which is able to classify the data points into the particular Gaussian component such as 0 for the first Gaussian and 1 for the second Gaussian component with the given mean and sigma.

However, if i run the code with 30 number of iterations, only the first and second iteration giving some changes on the data classification but does not properly classify the data. The next iteration is remain unchanged which is means that the data point are not properly assigned to the particular Gaussian component.

I would feel grateful if you could give me any advice or comment to improve the solution.

Thank you in advance.

]]>I find the suggestion in the last paragraph intriguing. I’d also add that the inconsistency of the posterior of a DPM for the number of components provides an extra psychological restraint against the simple interpretation of individual components

]]>I don’t see the clash here. The labels are meaningless, the things they label aren’t. (And if I read Judith and Kerrie’s paper right, it says that the weight of un-necessary components -> 0 asymptotically)

]]>But then how do you reconcile this utterly nihilist message with consistency results like those of Rousseau & Mengersen (2011)?

]]>I agree with Christian that (un)identifiability is a statistical not computational problem – but I disagree with the implication that labelling in mixtures is to do with (un)identifiability. The problem is caused by misunderstanding the nature of the parameters. The set of (say) k means in a finite mixture representation is not a vector, it is a point process.

]]>On Dan’s comment that he doesn’t “agree that MCMC convergence should be separated from the question of what is inferentially useful”, may I point you to a recent joint work in that direction with O. Cappé, G. Fort and B. Kégl

http://arxiv.org/abs/1210.2601

One of the contributions is to solve relabeling online, in a way that makes sense both from an inferential point of view and from a sampling point of view. The bottom line is that if you allow a sweep over all permutations at each MCMC iteration, you can dynamically find the cost function that makes your relabeled posterior as unimodal as possible and favours good mixing of adaptive Metropolis-Hastings at the same time.

On other comments, I agree that identifiability should be bypassed whenever possible. However, some applications require to get dirty and relabel. For example, we had this nice signal processing problem in particle physics, for the Auger experiment in particular:

http://iopscience.iop.org/1742-6596/368/1/012044

If your MCMC method moved among the k! symmetrical modes without any special transitions designed to ensure this, then you might indeed have increased confidence that it had also found all the non-symmetrical modes that exist. But failure to move among the k! symmetrical modes isn’t really much of a sign of any other problem (one that actually matters), since the k! symmetrical modes are often very far apart, so it’s no surprise that a generic MCMC method can’t move between them, even if it does move among non-symmetrical modes that aren’t so far apart.

Marginal likelihoods for mixture models are extremely sensitive to the prior used for the parameters of each mixture component. As this prior becomes more diffuse, the model with only one component becomes more and more favoured. So computing marginal likelihoods (eg, for models with different numbers of components) makes sense only if you have a very carefully thought out prior for the parameters of the mixture components, based on real prior information, regarding components that correspond to real physical entities.

In most cases, you should just be using a Dirichlet process mixture model (with an infinite number of components), and sampling from the posterior distribution of continuous hyperparameters such as the Dirichlet concentration parameter. The number of components used in a finite sample is still sensitive to the prior on component parameters (which ought to be controlled by hyperparameters), but fortunately you’ll be even less tempted to think individual components have some simple interpretation.

]]>Well, this is a case for doubting MCMC convergence and again the starting point of our 2000 paper (sorry for bringing it out again and again!). If you do not see a feature of your target, namely this massive symmetry, on your MCMC output, what level of trust do you put on it (i.e., on it converging to the target)?! In other words if I restrict the MCMC setting to the vicinity of a single mode, what are my tools to be happy with convergence issues?

]]>