sampling by exhaustion

The riddle set by The Riddler of last week sums up as follows:

Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero individual left?

While I cannot see a clean analytical solution to this problem, it reminds me of an enveloppe-versus-letter (matches) problem I saw in graduate school. Indeed, the expected number of removed (or selected) individuals is given by

N\left\{1-\frac{N-2}{N-1}\right\}^{N-1}

which is equivalent to (1-e⁻¹)N for N large, meaning that the population decreases by an average factor of e⁻¹ at each round. And that it takes on average approximately log(N) iterations to reach a single individual. A simulation of the first probabilities of ending with one individual led me to the above curve, which wiggles in an almost periodic way around the probability ½, equal to the average of those probabilities. Using the R code

rad=function(N){#next population size
  ut=sample(rep(2:N,2),1)
  for (i in 2:N)#sampling
   ut=c(ut,sample(rep((1:N)[-i],2),1))
  return(N-length(unique(ut))}
sal=rep(0,m);sal[1]=1
for (N in 3:M){
 prop=0;
 for (t in 1:T){#one single step
   i=rad(N)
   if (i>0) prop=prop+sal[i]}
 sal[N]=prop/T}

which exploits the previously computed probabilities. The variability is most interesting if unexpected, but looking back at Feller‘s sections and exercises on the classical occupancy problem, I could not find a connection with this problem. If it exists. Still, if N is large enough, the exclusion of one index from the selection becomes negligible and the probability of moving from n to m individuals should be approximately [Feller, eqn (2.4), p.102]

p_n(m)={n\choose m}\sum_{v=}^{n-m} (-1)^v {n-m\choose v} \left(1-\frac{m+v}{n}\right)^n

This formula approximates quite well the exact probability, but as in a previous post about the birthday problem, it proves quite delicate to compute. As already noticed by Feller.

One Response to “sampling by exhaustion”

  1. […] article was first published on R – Xi'an's Og, and kindly contributed to […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s