maximum likelihood on negative binomial

fishermen poles, Carnon harbour, France, June 13, 2012Estimating both parameters of a negative binomial distribution NB(N,p) by maximum likelihood sounds like an obvious exercise. But it is not because some samples lead to degenerate solutions, namely p=0 and N=∞… This occurs when the mean of the sample is larger than its empirical variance, s²>x̄, not an impossible instance: I discovered this when reading a Cross Validated question asking about the action to take in such a case. A first remark of interest is that this only happens when the negative binomial distribution is defined in terms of failures (since else the number of successes is bounded). A major difference I never realised till now, for estimating N is not a straightforward exercise. A second remark is that a negative binomial NB(N,p) is a Poisson compound of an LSD variate with parameter p, the Poisson being with parameter η=-N log(1-p). And the LSD being a power distribution pk/k rather than a psychedelic drug. Since this is not an easy framework, Adamidis (1999) introduces an extra auxiliary variable that is a truncated exponential on (0,1) with parameter -log(1-p). A very neat trick that removes the nasty normalising constant on the LSD variate.

“Convergence was achieved in all cases, even when the starting values were poor and this emphasizes the numerical stability of the EM algorithm.” K. Adamidis

Adamidis then constructs an EM algorithm on the completed set of auxiliary variables with a closed form update on both parameters. Unfortunately, the algorithm only works when s²>x̄. Otherwise, it gets stuck at the boundary p=0 and N=∞. I was hoping for a replica of the mixture case where local maxima are more interesting than the degenerate global maximum… (Of course, there is always the alternative of using a Bayesian noninformative approach.)

3 Responses to “maximum likelihood on negative binomial”

  1. Dan Simpson Says:

    Shouldn’t this problem be just as hard for (reference) Bayes? I did look at some point and I don’t remember seeing an overabundance of literature estimating the overdispersion parameter.

    And I thought that an easier representation was as a poisson with a gamma rate (or maybe inverse rate) parameter… I don’t think I’ve ever heard of an LSD before!

    • I presume that 1/m² or 1/m³ should be decreasing fast enough to counter the explosive behaviour of the likelihood. Obviously, most obviously, using some prior information is preferable when available.

      • I actually worked out the PC prior for this at some point (the KL distance is not analytic, but it’s pretty easy to compute), the problem was that the calibration needed to depend on the mean, which didn’t make it fantastically useful in a GLM setting.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s