## alternatives to EM

**I**n an arXived preprint submitted to *Computational Statistics & Data Analysis*, Chan, Han, and Lim study alternatives to EM for latent class models. That is, mixtures of products of Multinomials. (First occurrence of an indicator function being called the “Iverson bracket function”!) The introduction is fairly extensive given this most studied model. The criticisms of EM laid by the authors are that (a) it does not produce an evaluation of the estimation error, which does not sound correct; (b) the convergence is slow, which is also rather misleading as my [low dimensional] experience with mixtures is that it gets very quickly and apparently linearly to the vicinity of one of the modes. The argument in favour of alternative non-linear optimisation approaches is that they can achieve quadratic convergence. One solution is a projected Quasi-Newton method, based on a quadratic approximation to the target. With some additional intricacies that make the claim of being “way easier than EM algorithm” somewhat specious. The second approach proposed in the paper is sequential quadratic programming, which incorporates the Lagrange multiplier in the target. While the different simulations in the paper show that EM may indeed call for a much larger number of iterations, the obtained likelihoods all are comparable.

February 3, 2019 at 10:50 pm

Re: uncertainties from EM, I happened to be looking in Carlin & Louis today (the 2nd edn, still called “Bayes and Empirical Bayes methods…”), and they discuss how to get the observed information within an EM algorithm in Sxn 3.4, citing papers by Louis and Menge & Rubin from the 80s and 90s.

February 4, 2019 at 7:12 am

Thank you, Tom, I missed these earlier references! In his discussion of the EM 1977 paper, which I just re-read, Little notes that “in contrast [to scoring], the EM algorithm does not provide estimates of standard error, since calculation and inversion of the information matrix are avoided” (p.25). And so does Haberman with the Newton-Raphson algorithm (p.31). The whole discussion is fascinating, as well as featuring many friends of old, including a University of Glasgow contingent!

February 1, 2019 at 7:01 am

Christian, IIRC, the term “Iverson bracket” is due to Knuth; I first encountered it in his lovely book on discrete math with Graham and Patashnik, *Concrete Mathematics* (well, really it’s “CONtinuous-disCRETE” maths). It’s named for Ken Iverson, inventor of APL, who apparently first used it. I like this notation a lot, though I tend to write the brackets as double brackets, to distinguish them from normal math grouping brackets. See: https://en.wikipedia.org/wiki/Iverson_bracket.

Regarding EM and uncertainties in the estimates in an EM context, I’m reminded of David Oakes 1999 JRSSB paper: “Direct calculation of the information matrix via the EM algorithm,”

https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/1467-9868.00188. Admittedly the information matrix only gives an asymptotic approximation to the covariance matrix, but if you’re relying on optimization and a MAP estimator, that would probably satisfy you!

February 1, 2019 at 10:04 pm

Thank you, Tom, I really did not know about it. Regarding the variance of the MLE that can be obtained by EM, I was thinking of something similar to Oakes’, proposed by Olivier Cappé in the early 2000’s although I cannot recover the reference…