## occupancy rules

**W**hile the last riddle on The Riddler was rather anticlimactic, namely to find the mean of the number Y of empty bins in a uniform multinomial with n bins and m draws, with solution

[which still has a link with e in that the fraction of empty bins converges to e⁻¹ when n=m], this led me to some more involved investigation on the distribution of Y. While it can be shown directly that the probability that k bins are non-empty is

with an R representation by

miss<-function(n,m){ p=rep(0,n) for (k in 1:n) p[k]=choose(n,k)*sum((-1)^((k-1):0)*choose(k,1:k)*(1:k)^m) return(rev(p)/n^m)}

I wanted to take advantage of the moments of Y, since it writes as a sum of n indicators, counting the number of empty cells. However, the higher moments of Y are not as straightforward as its expectation and I struggled with the representation until I came upon this formula

where S(k,i) denotes the Stirling number of the second kind… Or i!S(n,i) is the number of surjections from a set of size n to a set of size i. Which leads to the distribution of Y by inverting the moment equations, as in the following R code:

diss<-function(n,m){ A=matrix(0,n,n) mome=rep(0,n) A[n,]=rep(1,n) mome[n]=1 for (k in 1:(n-1)){ A[k,]=(0:(n-1))^k for (i in 1:k) mome[k]=mome[k]+factorial(i)*as.integer(Stirling2(n,i))* (1-(i+1)/n)^m*factorial(k)/factorial(k-i-1)} return(solve(A,mome))}

that I still checked by raw simulations from the multinomial

zample<-function(n,m,T=1e4){ x=matrix(sample(1:n,m*T,rep=TRUE),nrow=T) x=sapply(apply(x,1,unique),length) return(n-x)}

## Leave a Reply