Archive for expectation maximisation

observed vs. complete in EM algorithm

Posted in Statistics with tags , , , , , on November 17, 2022 by xi'an

While answering a question related with the EM  algorithm on X validated, I realised a global (or generic) feature of the (objective) E function, namely that

E(\theta'|\theta)=\mathbb E_{\theta}[\log\,f_{X,Z}(x^\text{obs},Z|\theta')|X=x^\text{obs}]

can always be written as

\log\,f_X(x^\text{obs};\theta')+\mathbb E_{\theta}[\log\,f_{Z|X}(Z|x^\text{obs},\theta')|X=x^\text{obs}]

therefore always includes the (log-) observed likelihood, at least in this formal representation. While the proof that EM is monotonous in the values of the observed likelihood uses this decomposition as well, in that

\log\,f_X(x^\text{obs};\theta')=\log\,\mathbb E_{\theta}\left[\frac{f_{X,Z}(x^\text{obs},Z;\theta')}{f_{Z|X}(Z|x^\text{obs},\theta)}\big|X=x^\text{obs}\right]

I wonder if the appearance of the actual target in the temporary target E(θ’|θ) can be exploited any further.

truncated mixtures

Posted in Books, pictures, R, Statistics with tags , , , , , on May 4, 2022 by xi'an

A question on X validated about EM steps for a truncated Normal mixture led me to ponder whether or not a more ambitious completion [more ambitious than the standard component allocation] was appropriate. Namely, if the mixture is truncated to the interval (a,b), with an observed sample x of size n, this sample could be augmented into an untrucated sample y by latent samples over the complement of (a,b), with random sizes corresponding to the probabilities of falling within (-∞,a), (a,b), and (b,∞). In other words, y is made of three parts, including x, with sizes N¹, n, N³, respectively, the vector (N¹, n, N³) being a trinomial M(N⁺,p) random variable and N⁺ an extra unknown in the model. Assuming a (pseudo-) conjugate prior, an approximate Gibbs sampler can be run (by ignoring the dependence of p on the mixture parameters!). I did not go as far as implementing the idea for the mixture, but had a quick try for a simple truncated Normal. And did not spot any explosive behaviour in N⁺, which is what I was worried about.  Of course, this is mostly anecdotal since the completion does not bring a significant improvement in coding or convergence (the plots corresponds to 10⁴ simulations, for a sample of size n=400).

%d bloggers like this: