## back to moments

**A** recent paper posted on arXiv considers afresh the method of moments for mixtures of distributions. (“Afresh”, because the method was introduced by Karl Pearson in the 1890’s…) The authors (Animashree Anandkumar, Daniel Hsu, and Sham Kakade) estimate the parameters of a mixture of multinomial distributions (motivated as a “bag of words document topic” model) via the moment representation of pairwise and triple-wise probabilities. The estimate is obtained by a simple matricial formula using the empirical frequencies for pairs and triplets. The principle also applies for non-multinomial mixtures with components that are defined/parameterised by their mean (or rather first moments?), like Gaussian mixtures.

**T**his is neat, but there are a few caveats: (1) contrary to standard mixtures, the paper assumes that *þ* observations are made at once from a given component: in other words, components are drawn at random according to a multinomial distribution, then *þ* observations are generated from this given component. (This is rather unusual, esp. given that *þ* is *the same* across all samples. It should be feasible to extend the results in the paper to varying *þ*‘s…) (2) while the pairwise and triplewise statistics remain low order moments, avoiding the criticism raised against Pearson’s original estimator, those pairwise and even more triplewise frequency estimators are quickly getting poor as the number *d* of words in the vocabulary/dimension of the parameter increases, since there should be more and more zeros. (For a *D* dimensional Gaussian mixture with both mean and covariance matrix unknown, the authors consider the dimension is *D*/*þ* but this seems strange given the *D*+*D*²/2 parameters to estimate for each component…)

## Leave a Reply