## sampling by exhaustion

The riddle set by The Riddler of last week sums up as follows:

Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero individual left?

While I cannot see a clean analytical solution to this problem, it reminds me of an enveloppe-versus-letter (matches) problem I saw in graduate school. Indeed, the expected number of removed (or selected) individuals is given by

$N\left\{1-\frac{N-2}{N-1}\right\}^{N-1}$

which is equivalent to (1-e⁻¹)N for N large, meaning that the population decreases by an average factor of e⁻¹ at each round. And that it takes on average approximately log(N) iterations to reach a single individual. A simulation of the first probabilities of ending with one individual led me to the above curve, which wiggles in an almost periodic way around the probability ½, equal to the average of those probabilities. Using the R code

rad=function(N){#next population size
ut=sample(rep(2:N,2),1)
for (i in 2:N)#sampling
ut=c(ut,sample(rep((1:N)[-i],2),1))
return(N-length(unique(ut))}
sal=rep(0,m);sal[1]=1
for (N in 3:M){
prop=0;
for (t in 1:T){#one single step
if (i>0) prop=prop+sal[i]}
sal[N]=prop/T}


which exploits the previously computed probabilities. The variability is most interesting if unexpected, but looking back at Feller‘s sections and exercises on the classical occupancy problem, I could not find a connection with this problem. If it exists. Still, if N is large enough, the exclusion of one index from the selection becomes negligible and the probability of moving from n to m individuals should be approximately [Feller, eqn (2.4), p.102]

$p_n(m)={n\choose m}\sum_{v=}^{n-m} (-1)^v {n-m\choose v} \left(1-\frac{m+v}{n}\right)^n$

This formula approximates quite well the exact probability, but as in a previous post about the birthday problem, it proves quite delicate to compute. As already noticed by Feller.

### One Response to “sampling by exhaustion”

1. […] article was first published on R – Xi'an's Og, and kindly contributed to […]

This site uses Akismet to reduce spam. Learn how your comment data is processed.