## artificial EM

**W**hen addressing an X validated question on the use of the EM algorithm when estimating a Normal mean, my first comment was that it was inappropriate since there is no missing data structure to anchor by (right preposition?). However I then reflected upon the infinite number of ways to demarginalise the normal density into a joint density

∫ f(x,z;μ)dz = φ(x–μ)

from the (slice sampler) call to an indicator function for f(x,z;μ) to a joint Normal distribution with an arbitrary correlation. While the joint Normal representation produces a sequence converging to the MLE, the slice representation utterly fails as the indicator functions make any starting value of μ a fixed point for EM.

Incidentally, when quoting from Wikipedia on the purpose of the EM algorithm, the following passage

Finding a maximum likelihood solution typically requires taking the derivatives of the likelihood function with respect to all the unknown values, the parameters and the latent variables, and simultaneously solving the resulting equations.

struck me as confusing and possibly wrong since it seems to suggest to seek a maximum in *both* the parameter and the latent variables. Which does not produce the same value as the observed likelihood maximisation.

October 29, 2020 at 4:32 am

[…] article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

October 29, 2020 at 4:26 am

[…] article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

October 29, 2020 at 3:20 am

Sorry Christian for my too quick incursion into Bayesian views. The correct statement is the joint multinormal distribution f(y,u) of data (y) and random effects (u) maximised with respect to both fixed effects (beta) and random effects (u).

October 29, 2020 at 2:59 am

Oddly enough, Charles Henderson derived his so called mixed model equations that way taking the derivatives of the joint multinormal distribution of fixed and random effects ( a form of latent variables) with respect to both types of effects and equating them to zero : see Searle, Casella & Mc Culloch, 1992, page 276. The “ML estimations” of random effects so obtained are directly interpretable as conditional expectation of random effects given the data observed (E stage). Henderson (1973) himself recognized his fruitful error saying “Due to a lucky misinterpretation of maximum likelihood, I discovered in early 1949 a method involving a slight modification of least squares that is in fact, BLUP”. That might be also explain why these heterodox equations have been ignored outside the animal genetics and breeding community for several decades.

October 31, 2020 at 4:45 pm

Who would trust an approach based on a mistake, right?! Thank you for the pointer, Jean-Louis!

October 31, 2020 at 11:01 pm

Right Christian as expected.

According to SR Searle (1991. CR Henderson, the Statistician, J of Dairy Science, 74, 4035) , this technique referred to as “estimation of random effects” by CR Henderson, “went down as a lead balloon” at a meeting of the IMS in 1950. Difficult to fully grasp this idiomatic expression for non native English users but for sure not laudatory !

But as pointed out by Searle, Casella and McCulloch, (Variance Components, 1992) “an outgrowth of this statistical approach (ie the mixed model equations) led to procedures such as ridge regression (Hoerl and Kennard, 1970) and hierachical Bayes estimation (Lindley & Smith, 1972) and a variety of other applications. ”

Any other fruitful mistakes in statistics?

October 29, 2020 at 2:08 am

[…] article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) […]

October 28, 2020 at 3:26 am

[…] by data_admin [This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]