A rare occurrence of a statistics paper in Nature!, well Nature Scientific Reports, where the authors, Jaya Prakesh, Umang Agarwal and Phaneendra K. Yalavarthy, describe using a parallel implementation of the EM algorithm, for an image reconstruction in rock tomography. Due to a 1,887,436,800 x 1,887,436,800 matrix involved in the original 3D model.
Archive for EM algorithm
EM rocks!
Posted in Statistics with tags digital rock physics, EM algorithm, image reconstructino, Nature, parallelisation, tomography on October 8, 2021 by xi'anmixed feelings
Posted in Books, Kids, Statistics with tags cross validated, disjoint support, EM algorithm, Gibbs sampler, mixtures of distributions on September 9, 2021 by xi'anTwo recent questions on X validated about mixtures:
- One on the potential negative explosion of the E function in the EM algorithm for a mixture of components with different supports: “I was hoping to use the EM algorithm to fit a mixture model in which the mixture components can have differing support. I’ve run into a problem during the M step because the expected log-likelihood can be [minus] infinite” Which mistake is based on a confusion between the current parameter estimate and the free parameter to optimise.
- Another one on the Gibbs sampler apparently failing for a two-component mixture with only the weights unknown, when the components are close to one another: “The algorithm works fine if σ is far from 1 but it does not work anymore for σ close to 1.” Which did not see a wide posterior as a possible posterior when both components are similar and hence delicate to distinguish from one another.
EM degeneracy
Posted in pictures, Statistics, Travel, University life with tags ABC, BayesComp 2020, Bernstein-von Mises theorem, clustering, compatible conditional distributions, conference, cut models, cycle path, EM algorithm, Gibbs sampling, hidden Markov models, Institut de Mathématique d'Orsay, MCMC, MHC 2021, mixtures, particle filters, physical attendance, Rao-Blackwellisation, SEM, SMC, smoothing, Université Paris-Sud on June 16, 2021 by xi'anAt the MHC 2021 conference today (to which I biked to attend for real!, first time since BayesComp!) I listened to Christophe Biernacki exposing the dangers of EM applied to mixtures in the presence of missing data, namely that the algorithm has a rising probability to reach a degenerate solution, namely a single observation component. Rising in the proportion of missing data. This is not hugely surprising as there is a real (global) mode at this solution. If one observation components are prohibited, they should not be accepted in the EM update. Just as in Bayesian analyses with improper priors, the likelihood should bar single or double observations components… Which of course makes EM harder to implement. Or not?! MCEM, SEM and Gibbs are obviously straightforward to modify in this case.
Judith Rousseau also gave a fascinating talk on the properties of non-parametric mixtures, from a surprisingly light set of conditions for identifiability to posterior consistency . With an interesting use of several priors simultaneously that is a particular case of the cut models. Namely a correct joint distribution that cannot be a posterior, although this does not impact simulation issues. And a nice trick turning a hidden Markov chain into a fully finite hidden Markov chain as it is sufficient to recover a Bernstein von Mises asymptotic. If inefficient. Sylvain LeCorff presented a pseudo-marginal sequential sampler for smoothing, when the transition densities are replaced by unbiased estimators. With connection with approximate Bayesian computation smoothing. This proves harder than I first imagined because of the backward-sampling operations…
EM gets the Nobel (of statistics)
Posted in Statistics with tags EM algorithm, Harvard University, International Prize in Statistics, longitudinal studies, Nan Laird, random effects on March 23, 2021 by xi'anfolded Normals
Posted in Books, Kids, pictures, R, Running, Statistics with tags Annals of Applied Statistics, EM algorithm, folded normal, La Butte aux Cailles, letter to the editor, maximum likelihood estimation, nlm, outdoor swimming, Paris, R, Rfast, swimming pool, wikipedia on February 25, 2021 by xi'anWhile having breakfast (after an early morn swim at the vintage La Butte aux Cailles pool, which let me in free!), I noticed a letter to the Editor in the Annals of Applied Statistics, which I was unaware existed. (The concept, not this specific letter!) The point of the letter was to indicate that finding the MLE for the mean and variance of a folded normal distribution was feasible without resorting to the EM algorithm. Since the folded normal distribution is a special case of mixture (with fixed weights), using EM is indeed quite natural, but the author, Iain MacDonald, remarked that an optimiser such as R nlm() could be called instead. The few lines of relevant R code were even included. While this is a correct if minor remark, I am a wee bit surprised at seeing it included in the journal, the more because the authors of the original paper using the EM approach were given the opportunity to respond, noticing EM is much faster than nlm in the cases they tested, and Iain MacDonald had a further rejoinder! The more because the Wikipedia page mentioned the use of optimisers much earlier (and pointed out at the R package Rfast as producing MLEs for the distribution).