Estimating the number of species

Bayesian Analysis just published on-line a paper by Hongmei Zhang and Hal Stern on a (new) Bayesian analysis of the problem of estimating the number of unseen species within a population. This problem has always fascinated me, as it seems at first sight to be an impossible problem, how can you estimate the number of species you do not know?! The approach relates to capture-recapture models, with an extra hierarchical layer for the species. The Bayesian analysis of the model obviously makes a lot of sense, with the prior modelling being quite influential. Zhang and Stern use a hierarchical Dirichlet prior on the capture probabilities, $\theta_i$, when the captures follow a multinomial model

$y|\theta,S \sim \mathcal{M}(N, \theta_1,\ldots,\theta_S)$

where $N=\sum_i y_i$ the total number of observed individuals,

$\mathbf{\theta}|S \sim \mathcal{D}(\alpha,\ldots,\alpha)$

and

$\pi(\alpha,S) = f(1-f)^{S-S_\text{min}} \alpha^{-3/2}$

forcing the coefficients of the Dirichlet prior towards zero. The paper also covers predictive design, analysing the capture effort corresponding to a given recovery rate of species. The overall approach is not immensely innovative in its methodology, the MCMC part being rather straightforward, but the predictive abilities of the model are nonetheless interesting.

The previously accepted paper in Bayesian Analysis is a note by Ron Christensen about an inconsistent Bayes estimator that you may want to use in an advanced Bayesian class. For all practical purposes, it should not overly worry you, since the example involves a sampling distribution that is normal when its parameter is irrational and is Cauchy otherwise. (The prior is assumed to be absolutely continuous wrt the Lebesgue measure and it thus gives mass zero to the set of rational numbers $\mathbb{Q}$. The fact that $\mathbb{Q}$ is dense in $\mathbb{R}$ is irrelevant from a measure-theoretic viewpoint.)