About capture-recapture

I really like the models derived from capture-recapture experiments, because they encompass latent variables, hidden Markov process, Gibbs simulation, EM estimation, and hierarchical models in a simple setup with a nice side story to motivate it (at least in Ecology, in Social Sciences, those models are rather associated with sad stories like homeless, heroin addicts or prostitutes…) I was thus quite surprised to hear from many that the capture-recapture chapter in Bayesian Core was hard to understand. In a sense, I find it easier than the mixture chapter because the data is discrete and everything can [almost!] be done by hand…

Today I received an email from Cristiano about a typo in The Bayesian Choice concerning capture-recapture models:

“I’ve read the paragraph (4.3.3) in your book and I have some doubts about the proposed formula in example 4.3.3. My guess is that a typo is here, where (n-n_1) instead of n_2 should appear in the hypergeometric distribution.”

It is indeed the case! This mistake has been surviving the many revisions and reprints of the book and is also found in the French translation Le Choix Bayésien, in Example 4.19… In both cases, {n_2 \choose n_2-n_{11}} should be {n-n_1 \choose n_2-n_{11}}, shame on me! (The mistake does not appear in Bayesian Core.)

My reader also had a fairly interesting question about an extension of the usual model,

That said, I would appreciate if you could help me in finding references to a slightly different setting, where the assumption is that while collecting the first or the second sample, an individual may appear twice. If we assume that a stopping rule is used: “n_1 or n_2 equal 5 and the captured individuals are different” my guess is that the hypergeometric formulation is incomplete and may lead to overestimation of the population. Is there in your knowledge any already developed study you can point me about this different framework?

to which I can only suggest to incorporate the error-in-variable structure, ie the possible confusion  in identifying individuals, within the model and to run a Gibbs sampler that simulates iteratively the latent variable” true numbers of individuals in captures 1 and 2″ and the parameters given those latent variables. This problem of counting the same individual twice or more has obvious applications in Ecology, when animals are only identified by watchers, as in whale sightings, and in Social Sciences, when individuals are lacking identification. [To answer specifically the overestimation question, this is clearly the case since n_1 and n_2 are larger than in truth, while n_{11} presumably remains the same….]

Bayesian Data Analysis for Ecologists [reflections]

The course I gave last week in the Gran Paradiso National Park was certainly one of the most exciting I ever gave! And not only because of the paradisiac location. Indeed, the young twenty ecologists/biologists/geneticists who attended the course were unbelievably motivated to learn about Bayesian Statistics. They did come to the course with a (strong) purpose and with clear problems and real datasets as well.They were thus ready to endure some of my most theoretical slides to obtain indications for progressing in their own work. Another great point was the repeated will to get beyond black box solutions, even Bayesian black box solutions, to understand the available softwares and possibly to modify them. Towards this goal, I slightly modified the way I usually teach Bayesian Core, in order to replace the standard datasets with three “local” datasets obtained from the park biologist, Achaz von Hardenberg (but not available publicly). Here are the modified slides:

which will mostly be useful to those who attended the course. I am thus very grateful to Achaz von Hardenberg, from the Alpine Wildlife Research Centre, Gran Paradiso National Park, and to Antonietta Mira, from the University of Insubria, who invited me to give this course. My only regret is that we could not cover the types of problems met by the attendees more in depth. Given their diversity and richness, this is rather frustrating! It is thus most likely that we will have a follow-up course in a near future with the same participants based on case studies only, where we can study those problems in thematic groups.