## hard birthday problem

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on February 4, 2021 by xi'an

From an X validated question, found that WordPress now allows for direct link to pdf documents, like the above paper by my old friend Anirban Das Gupta! The question is about estimating a number M of individuals with N distinct birth dates over a year of T days. After looking around I could not find a simpler representation of the probability for N=r other than (1) in my answer,

$\frac{T!}{(\bar N-r)!}\frac{m!}{T^m} \sum_{(r_1,\ldots,r_m);\\\sum_1^m r_i=r\ \&\\\sum_1^m ir_i=m}1\Big/\prod_{j=1}^m r_j! (j!)^{r_j}$

borrowed from a paper by Fisher et al. (Another Fisher!) Checking Feller leads to the probability (p.102)

${T \choose r}\sum_{\nu=0}^r (-1)^{\nu}{r\choose\nu}\left(1-\frac{T-r+\nu}T \right)^m$

which fits rather nicely simulation frequencies, as shown using

apply(!apply(matrix(sample(1:Nb,T*M,rep=TRUE),T,M),1,duplicated),2,sum)


Further, Feller (1970, pp.103-104) justifies an asymptotic Poisson approximation with parameter$$\lambda(M)=\bar{N}\exp\{-M/\bar N\}$ from which an estimate of$M$can be derived. With the birthday problem as illustration (pp.105-106)! It may be that a completion from N to (R¹,R²,…) where the components are the number of days with one birthdate, two birthdates, &tc. could help design an EM algorithm that would remove the summation in (1) but I did not spend more time on the problem (than finding a SAS approximation to the probability!). ## the riddle(r) of the certain winner losing in the end Posted in Books, Kids, R, Statistics with tags , , , , , on November 25, 2020 by xi'an Considering a binary random walk, starting at zero, what is the probability of being almost sure of winning at some point only to lose at the end? This is the question set by the post-election Riddler, with almost sure meaning above 99% and the time horizon set to n=101 steps (it could have been 50 or 538!). As I could not see a simple way to compute the collection of states with a probability of being positive at the end of at least 0.99, even after checking William Feller’s Random Walks fabulous chapter, I wrote an R code to find them, and then ran a Monte Carlo evaluation of the probability to reach this collection and still end up with a negative value. Which came as 0.00212 over repeated simulations. Obviously smaller than 0.01, but no considerably so. As seen on the above picture, the set to be visited is actually not inconsiderable. The bounding curves are the diagonal and the 2.33 √(n-t) bound derived from the limiting Brownian approximation to the random walk, which fits rather well. (I wonder if there is a closed form expression for the probability of the Brownian hitting the boundary 2.33 √(n-t). Simulations with 1001 steps give an estimated probability of 0.505, leading to a final probability of 0.00505 of getting over the boundary and loosing in the end, close to the 1/198 produced by The Riddler.) ## Dorfman’s group testing Posted in Books, Kids, Statistics, Travel with tags , , , , , , , , on April 9, 2020 by xi'an A recent note by CREST researchers insists on using group testing to compensate for the shortage of test packages and testing personal, as done in several countries, towards deconfining individuals who are not infected. Or who are exhibiting the right antibodies. Reminding me of my first entry to the notion, in Feller’s book, of the method implemented by Robert Dorfman to test for syphilis prior to enlisting potential WW II soldiers. I would deem the idea useful for surveys, in identifying the proportion of infected or immunised persons, maybe less for giving the green light to leave one’s house as the logistics of merging the tests while keeping track of every individual could prove impossible. ## complex Cauchys Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on February 8, 2018 by xi'an During a visit of Don Fraser and Nancy Reid to Paris-Dauphine where Nancy gave a nice introduction to confidence distributions, Don pointed out to me a 1992 paper by Peter McCullagh on the Cauchy distribution. Following my recent foray into the estimation of the Cauchy location parameter. Among several most interesting aspects of the Cauchy, Peter re-expressed the density of a Cauchy C(θ¹,θ²) as f(x;θ¹,θ²) = |θ²| / |x-θ|² when θ=θ¹+ιθ² [a complex number on the half-plane]. Denoting the Cauchy C(θ¹,θ²) as Cauchy C(θ), the property that the ratio aX+b/cX+d follows a Cauchy for all real numbers a,b,c,d, C(aθ+b/cθ+d) [when X is C(θ)] follows rather readily. But then comes the remark that “those properties follow immediately from the definition of the Cauchy as the ratio of two correlated normals with zero mean.” which seems to relate to the conjecture solved by Natesh Pillai and Xiao-Li Meng a few years ago. But the fact that a ratio of two correlated centred Normals is Cauchy is actually known at least from the1930’s, as shown by Feller (1930, Biometrika) and Geary (1930, JRSS B). ## easy riddle Posted in Books, Kids, R with tags , , , , , on July 12, 2017 by xi'an From the current Riddler, a problem that only requires a few lines of code and a few seconds of reasoning. Or not. N households each stole the earnings from one of the (N-1) other households, one at a time. What is the probability that a given household is not burglarised? And what are the expected final earnings of each household in the list, assuming they all start with$1?

The first question is close to Feller’s enveloppe problem in that

$\left(1-\frac{1}{N-1}\right)^{N-1}$

is close to exp(-1) for N large. The second question can easily be solved by an R code like

N=1e3;M=1e6
fina=rep(1,N)
for (v in 1:M){
ordre=sample(1:N)
vole=sample(1:N,N,rep=TRUE)
while (min(abs(vole-(1:N)))==0)
vole[abs(vole-(1:N))==0]=sample(1:N,
sum(vole-(1:N)==0))
cash=rep(1,N)
for (t in 1:N){
cash[ordre[t]]=cash[ordre[t]]+cash[vole[t]];cash[vole[t]]=0}
fina=fina+cash[ordre]}


which returns a pretty regular exponential-like curve, although I cannot figure the exact curve beyond the third burglary. The published solution gives the curve

${\frac{N-2}{N-1}}^{999}\times 2+{\frac{1}{N-1}}^{t-1}\times{\frac{N-1}{N}}^{N-t}\times\frac{N}{N-1}$

corresponding to the probability of never being robbed (and getting on average an extra unit from the robbery) and of being robbed only before robbing someone else (with average wealth N/(N-1)).