## maximum likelihood on negative binomial

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on October 7, 2015 by xi'an Estimating both parameters of a negative binomial distribution NB(N,p) by maximum likelihood sounds like an obvious exercise. But it is not because some samples lead to degenerate solutions, namely p=0 and N=∞… This occurs when the mean of the sample is larger than its empirical variance, s²>x̄, not an impossible instance: I discovered this when reading a Cross Validated question asking about the action to take in such a case. A first remark of interest is that this only happens when the negative binomial distribution is defined in terms of failures (since else the number of successes is bounded). A major difference I never realised till now, for estimating N is not a straightforward exercise. A second remark is that a negative binomial NB(N,p) is a Poisson compound of an LSD variate with parameter p, the Poisson being with parameter η=-N log(1-p). And the LSD being a power distribution pk/k rather than a psychedelic drug. Since this is not an easy framework, Adamidis (1999) introduces an extra auxiliary variable that is a truncated exponential on (0,1) with parameter -log(1-p). A very neat trick that removes the nasty normalising constant on the LSD variate.

“Convergence was achieved in all cases, even when the starting values were poor and this emphasizes the numerical stability of the EM algorithm.” K. Adamidis Adamidis then constructs an EM algorithm on the completed set of auxiliary variables with a closed form update on both parameters. Unfortunately, the algorithm only works when s²>x̄. Otherwise, it gets stuck at the boundary p=0 and N=∞. I was hoping for a replica of the mixture case where local maxima are more interesting than the degenerate global maximum… (Of course, there is always the alternative of using a Bayesian noninformative approach.)

## Rasmus’ socks fit perfectly!

Posted in Books, Kids, R, Statistics, University life with tags , , , , on November 10, 2014 by xi'an Following the previous post on Rasmus’ socks, I took the opportunity of a survey on ABC I am currently completing to compare the outcome of his R code with my analytical derivation. After one quick correction [by Rasmus] of a wrong representation of the Negative Binomial mean-variance parametrisation [by me], I achieved this nice fit… ## the alive particle filter

Posted in Books, Statistics, University life with tags , , , , , , , , on February 14, 2014 by xi'an

As mentioned earlier on the ‘Og, this is a paper written by Ajay Jasra, Anthony Lee, Christopher Yau, and Xiaole Zhang that I missed when it got arXived (as I was also missing my thumb at the time…) The setting is a particle filtering one with a growing product of spaces and constraints on the moves between spaces. The motivating example is one of an ABC algorithm for an HMM where at each time, the simulated (pseudo-)observation is forced to remain within a given distance of the true observation. The (one?) problem with this implementation is that the particle filter may well die out by making only proposals that stand out of the above ball. Based on an idea of François Le Gland and Nadia Oudjane, the authors define the alive filter by imposing a fixed number of moves onto the subset, running a negative binomial number of proposals. By removing the very last accepted particle, they show that the negative binomial experiment allows for an unbiased estimator of the normalising constant(s). Most of the paper is dedicated to the theoretical study of this scheme, with results summarised as (p.2)

1. Time uniform Lp bounds for the particle filter estimates
2. A central limit theorem (CLT) for suitably normalized and centered particle filter estimates
3. An unbiased property of the particle filter estimates
4. The relative variance of the particle filter estimates, assuming N = O(n), is shown to grow linearly in n.

The assumptions necessary to reach those impressive properties are fairly heavy (or “exceptionally strong” in the words of the authors, p.5): the original and constrained transition kernels are both uniformly ergodic, with equivalent coverage of the constrained subsets for all possible values of the particle at the previous step. I also find the proposed implementation of the ABC filtering inadequate for approximating the posterior on the parameters of the (HMM) model. Expecting every realisation of the simulated times series to be close to the corresponding observed value is too hard a constraint. The above results are scaled in terms of the number N of accepted particles but it may well be that the number of generated particles and hence the overall computing time is much larger. In the examples completing the paper, the comparison is run with an earlier ABC sampler based on the very same stepwise proximity to the observed series so the impact of this choice is difficult to assess and, furthermore, the choice of the tolerances ε is difficult to calibrate: is 3, 6, or 12 a small or large value for ε? A last question that I heard from several sources during talks on that topic is why an ABC approach would be required in HMM settings where SMC applies. Given that the ABC reproduces a simulation on the pair latent variables x parameters, there does not seem to be a gain there…