## EM rocks!

Posted in Statistics with tags , , , , , on October 8, 2021 by xi'an

A rare occurrence of a statistics paper in Nature!, well Nature Scientific Reports, where the authors, Jaya Prakesh, Umang Agarwal and Phaneendra K. Yalavarthy, describe using a parallel implementation of the EM algorithm, for an image reconstruction in rock tomography. Due to a 1,887,436,800 x 1,887,436,800 matrix involved in the original 3D model.

## mixed feelings

Posted in Books, Kids, Statistics with tags , , , , on September 9, 2021 by xi'an

Two recent questions on X validated about mixtures:

1. One on the potential negative explosion of the E function in the EM algorithm for a mixture of components with different supports:  “I was hoping to use the EM algorithm to fit a mixture model in which the mixture components can have differing support. I’ve run into a problem during the M step because the expected log-likelihood can be [minus] infinite” Which mistake is based on a confusion between the current parameter estimate and the free parameter to optimise.
2. Another one on the Gibbs sampler apparently failing for a two-component mixture with only the weights unknown, when the components are close to one another:  “The algorithm works fine if $$σ$$ is far from $$1$$ but it does not work anymore for $$σ$$ close to $$1$$.” Which did not see a wide posterior as a possible posterior when both components are similar and hence delicate to distinguish from one another.

## EM degeneracy

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on June 16, 2021 by xi'an

At the MHC 2021 conference today (to which I biked to attend for real!, first time since BayesComp!) I listened to Christophe Biernacki exposing the dangers of EM applied to mixtures in the presence of missing data, namely that the algorithm has a rising probability to reach a degenerate solution, namely a single observation component. Rising in the proportion of missing data. This is not hugely surprising as there is a real (global) mode at this solution. If one observation components are prohibited, they should not be accepted in the EM update. Just as in Bayesian analyses with improper priors, the likelihood should bar single or double  observations components… Which of course makes EM harder to implement. Or not?! MCEM, SEM and Gibbs are obviously straightforward to modify in this case.

Judith Rousseau also gave a fascinating talk on the properties of non-parametric mixtures, from a surprisingly light set of conditions for identifiability to posterior consistency . With an interesting use of several priors simultaneously that is a particular case of the cut models. Namely a correct joint distribution that cannot be a posterior, although this does not impact simulation issues. And a nice trick turning a hidden Markov chain into a fully finite hidden Markov chain as it is sufficient to recover a Bernstein von Mises asymptotic. If inefficient. Sylvain LeCorff presented a pseudo-marginal sequential sampler for smoothing, when the transition densities are replaced by unbiased estimators. With connection with approximate Bayesian computation smoothing. This proves harder than I first imagined because of the backward-sampling operations…

## EM gets the Nobel (of statistics)

Posted in Statistics with tags , , , , , on March 23, 2021 by xi'an

## folded Normals

Posted in Books, Kids, pictures, R, Running, Statistics with tags , , , , , , , , , , , , on February 25, 2021 by xi'an

While having breakfast (after an early morn swim at the vintage La Butte aux Cailles pool, which let me in free!), I noticed a letter to the Editor in the Annals of Applied Statistics, which I was unaware existed. (The concept, not this specific letter!) The point of the letter was to indicate that finding the MLE for the mean and variance of a folded normal distribution was feasible without resorting to the EM algorithm. Since the folded normal distribution is a special case of mixture (with fixed weights), using EM is indeed quite natural, but the author, Iain MacDonald, remarked that an optimiser such as R nlm() could be called instead. The few lines of relevant R code were even included. While this is a correct if minor remark, I am a wee bit surprised at seeing it included in the journal, the more because the authors of the original paper using the EM approach were given the opportunity to respond, noticing EM is much faster than nlm in the cases they tested, and Iain MacDonald had a further rejoinder! The more because the Wikipedia page mentioned the use of optimisers much earlier (and pointed out at the R package Rfast as producing MLEs for the distribution).