@ARTICLE{Wu1983,

author = {C. F. J. Wu},

title = {On the Convergence Properties of the {EM} Algorithm},

journal = {Annals of Statistics},

year = {1983},

volume = {11},

pages = {95–103},

}

which is a must read (maybe more than the original Dempster et al).

I’d also add the overlooked and very, very short

@ARTICLE{Zehna1966,

author = {Zehna, Peter W.},

title = {Invariance of Maximum Likelihood Estimators},

journal = {Annals of Mathematical Statistics},

year = {1966},

volume = {37},

pages = {744},

}

and its very clean correction

@ARTICLE{Berk1967,

author = {Berk, R.H.},

title = {Review 1922 of Invariance of Maximum Likelihood Estimators by Peter W. Zehna},

journal = {Mathematical Reviews},

year = {1967},

volume = {33},

pages = {343-344}

}

who try to really define the maximum-likelihood estimator of $f(\theta)$ when $f$ is not one-to-one. Not as brilliant as the others in the list, of course, but worth having a look at.

In a totally different field, the following is a definite must-read in non linear design of experiments:

@ARTICLE{Box1959,

author = {Box, G.E.P. and Lucas, H.L.},

title = {Design of Experiments in Nonlinear Situations},

journal = {Biometrika},

year = {1959},

volume = {46},

pages = {77-90}

}

What people have in mind about “classics” is actually a matter of debate. But in an historical perspective I would add at least one paper by RA Fisher, although most of them are difficult to read. May be “Theory of Statistical Estimation” (1922) as it laid the foundations of modern inference.

A more applied choice could be the 1936 one on mixtures with the famous “Iris” example. But this man has tackled some many subjects in statistics that we are spoilt for choice. ]]>