## maximum likelihood on negative binomial

**E**stimating both parameters of a negative binomial distribution NB(N,p) by maximum likelihood sounds like an obvious exercise. But it is not because some samples lead to degenerate solutions, namely p=0 and N=∞… This occurs when the mean of the sample is larger than its empirical variance, s²>x̄, not an impossible instance: I discovered this when reading a Cross Validated question asking about the action to take in such a case. A first remark of interest is that this only happens when the negative binomial distribution is defined in terms of failures (since else the number of *successes* is bounded). A major difference I never realised till now, for estimating N is not a straightforward exercise. A second remark is that a negative binomial NB(N,p) is a Poisson compound of an LSD variate with parameter p, the Poisson being with parameter η=-N log(1-p). And the LSD being a power distribution p^{k}/k rather than a psychedelic drug. Since this is not an easy framework, Adamidis (1999) introduces an extra auxiliary variable that is a truncated exponential on (0,1) with parameter -log(1-p). A very neat trick that removes the nasty normalising constant on the LSD variate.

“Convergence was achieved in all cases, even when the starting values were poor and this emphasizes the numerical stability of the EM algorithm.” K. Adamidis

Adamidis then constructs an EM algorithm on the completed set of auxiliary variables with a closed form update on both parameters. Unfortunately, the algorithm only works when s²>x̄. Otherwise, it gets stuck at the boundary p=0 and N=∞. I was hoping for a replica of the mixture case where local maxima are more interesting than the degenerate global maximum… (Of course, there is always the alternative of using a Bayesian noninformative approach.)

October 7, 2015 at 3:13 am

Shouldn’t this problem be just as hard for (reference) Bayes? I did look at some point and I don’t remember seeing an overabundance of literature estimating the overdispersion parameter.

And I thought that an easier representation was as a poisson with a gamma rate (or maybe inverse rate) parameter… I don’t think I’ve ever heard of an LSD before!

October 7, 2015 at 8:37 am

I presume that 1/m² or 1/m³ should be decreasing fast enough to counter the explosive behaviour of the likelihood. Obviously, most obviously, using some prior information is preferable when available.

October 7, 2015 at 11:45 am

I actually worked out the PC prior for this at some point (the KL distance is not analytic, but it’s pretty easy to compute), the problem was that the calibration needed to depend on the mean, which didn’t make it fantastically useful in a GLM setting.