## sampling w/o replacement except when replacing

Posted in Books, Kids, R with tags , , , , , , , on November 3, 2020 by xi'an

Another Riddle(r), considering a box with M myrtle balls and D dandelion balls. Drawing balls without replacement while they stay of the same color as the initial draw, else put back the last ball and repeat the process until all balls are drawn. The funny thing is that, unless M=0 or D=0, the probability to draw a myrtle ball at the end is always ½..! This can be easily checked by simulation (when M=2 and D=8)

r=function()sample(0:1,1,p=c(d,m))
for(t in 1:1e6){
m=2;d=8
i=r();m=m-!!i;d=d-!i
while(!!m*d){
j=r();i=ifelse(i==j,j,r())
m=m-!!i;d=d-!i}
F=F+(m>0)}
F/1e6


Now the proof that the probability is ½ is quite straightforward, for M=1 (or D=1). But I cannot find a quick fix for larger values. I thus reasoned by recursion, with the probability of emptying a given colour first is d!m!/(d+m)!, whatever the colour and whatever d>0,m>0. Hence half a chance to finish with myrtle. Any shorter sequence of a given colour reduces the value of either d or m, at which point we are using the recursion assumption that the probability is ½…

## Le Monde puzzle [#1121]

Posted in Books, Kids with tags , , , , on December 17, 2019 by xi'an

A combinatoric puzzle as Le weekly Monde current mathematical puzzle:

A class of 75<n<100 students is divided at random into two groups of sizes a and b=n-a, respectively, such that the probability that two particular students Ji-ae and Jung-ah have a probability of exactly 1/2 to be in the same group. Find a and n.

(with an original wording mentioning an independent allocation to the group!). Since the probability to be in the same group (under a simple uniform partition distribution) is

$\frac{a(1-1)}{n(n-1)}+\frac{b(b-1)}{n(n-1)}$

it is sufficient to seek by exhaustion values of (a,b) such that this ratio is equal to ½. The only solution within the right range is then (36,45) (up to the symmetric pair). This can be also found by seeking integer solutions to the second degree polynomial equation, namely

$b^\star=\left[ 1+2a\pm\sqrt{1+8a}\right]/2 \in \mathbb N$

## Riddler collector

Posted in Statistics with tags , , , , , , , on September 22, 2018 by xi'an

Once in a while a fairly standard problem makes it to the Riddler puzzle of the week. Today, it is the coupon collector problem, explained by W. Huber on X validated. (W. Huber happens to be the top contributor to this forum, with over 2000 answers, and the highest reputation closing on 200,000!) With nothing (apparently) unusual: coupons [e.g., collecting cards] come in packs of k=10 with no duplicate, and there are n=100 different coupons. What is the expected number one has to collect before getting all of the n coupons?  W. Huber provides an R code to solve the recurrence on the expectation, obtained by conditioning on the number m of different coupons already collected, e(m,n,k) and hence on the remaining number of collect, with an Hypergeometric distribution for the number of new coupons in the next pack. Returning 25.23 packs on average. As is well-known, the average number of packs to complete one’s collection with the final missing card is expensively large, with more than 5 packs necessary on average. The probability distribution of the required number of packs has actually been computed by Laplace in 1774 (and then again by Euler in 1785).

## yet another typo

Posted in Books, Statistics, University life with tags , , on August 21, 2011 by xi'an

An email from Stefan Webb pointed out an embarrassing typo in the Appendix of The Bayesian Choice:

There is a type on page 522 in the definition of the density function of the hypergeometric distribution (A.14). It should read “pN choose x”, not “pn choose x” in the numerator of f since you have reparameterized the standard form with m = pN.

Indeed, the density of the hypergeometric distribution should be

$f(x|p)=\dfrac{{pN\choose x}{(1-p)N\choose n-x}}{{N\choose n}}\mathbb{I}_{\{n-(1-p)N,\ldots,pN\}}(x) \mathbb{I}_{\{0,1,\ldots,n\}}(x)$

$f(x|p)=\dfrac{{pn\choose x}{(1-p)N\choose n-x}}{{N\choose n}}\mathbb{I}_{\{n-(1-p)N,\ldots,pN\}}(x) \mathbb{I}_{\{0,1,\ldots,n\}}(x)$
Thomas Clerc from Fribourg pointed out an embarassing typo in Chapter 8 of “Introducing Monte Carlo Methods with R”, namely that I defined on page 247 the complex number $\iota$ as the squared root of 1 and not of -1! Not that this impacts much on the remainder of the book but still an embarassment!!!
An inconsistent notation was uncovered by Bastien Boussau from Berkeley this time for the book The Bayesian Choice. In Example 1.1.3, on page 3, I consider an hypergeometric $\mathcal{H}(30,N,20/N)$ distribution, while in Appendix A, I denote hypergeometric distributions as $\mathcal{H}(N;n;p)$, inverting the role of the population size and of the sample size. Sorry about that, inconsistencies in notations are alas occuring in my books… In case I have not mentioned it so far, Example 4.3.3 further involves a typo (detected by Cristiano Passerini from Pontecchio Marconi) again with the hypergeometric distribution  $\mathcal{H}(N;n;p)$! The ratio should be
$\dfrac{{n_1\choose n_{11}} {n-n_1\choose n_2-n_{11}}\big/ {n\choose n_2}\pi(N=n)}{\sum_{k=36}^{50} {n_1\choose n_{11}} {k-n_1\choose n_2-n_{11}}\big/ {k\choose n_2}\pi(N=k)}$