## Riddler collector

Posted in Statistics with tags , , , , , , , on September 22, 2018 by xi'an

Once in a while a fairly standard problem makes it to the Riddler puzzle of the week. Today, it is the coupon collector problem, explained by W. Huber on X validated. (W. Huber happens to be the top contributor to this forum, with over 2000 answers, and the highest reputation closing on 200,000!) With nothing (apparently) unusual: coupons [e.g., collecting cards] come in packs of k=10 with no duplicate, and there are n=100 different coupons. What is the expected number one has to collect before getting all of the n coupons?  W. Huber provides an R code to solve the recurrence on the expectation, obtained by conditioning on the number m of different coupons already collected, e(m,n,k) and hence on the remaining number of collect, with an Hypergeometric distribution for the number of new coupons in the next pack. Returning 25.23 packs on average. As is well-known, the average number of packs to complete one’s collection with the final missing card is expensively large, with more than 5 packs necessary on average. The probability distribution of the required number of packs has actually been computed by Laplace in 1774 (and then again by Euler in 1785).

## yet another typo

Posted in Books, Statistics, University life with tags , , on August 21, 2011 by xi'an

An email from Stefan Webb pointed out an embarrassing typo in the Appendix of The Bayesian Choice:

There is a type on page 522 in the definition of the density function of the hypergeometric distribution (A.14). It should read “pN choose x”, not “pn choose x” in the numerator of f since you have reparameterized the standard form with m = pN.

Indeed, the density of the hypergeometric distribution should be

$f(x|p)=\dfrac{{pN\choose x}{(1-p)N\choose n-x}}{{N\choose n}}\mathbb{I}_{\{n-(1-p)N,\ldots,pN\}}(x) \mathbb{I}_{\{0,1,\ldots,n\}}(x)$

$f(x|p)=\dfrac{{pn\choose x}{(1-p)N\choose n-x}}{{N\choose n}}\mathbb{I}_{\{n-(1-p)N,\ldots,pN\}}(x) \mathbb{I}_{\{0,1,\ldots,n\}}(x)$
Thomas Clerc from Fribourg pointed out an embarassing typo in Chapter 8 of “Introducing Monte Carlo Methods with R”, namely that I defined on page 247 the complex number $\iota$ as the squared root of 1 and not of -1! Not that this impacts much on the remainder of the book but still an embarassment!!!
An inconsistent notation was uncovered by Bastien Boussau from Berkeley this time for the book The Bayesian Choice. In Example 1.1.3, on page 3, I consider an hypergeometric $\mathcal{H}(30,N,20/N)$ distribution, while in Appendix A, I denote hypergeometric distributions as $\mathcal{H}(N;n;p)$, inverting the role of the population size and of the sample size. Sorry about that, inconsistencies in notations are alas occuring in my books… In case I have not mentioned it so far, Example 4.3.3 further involves a typo (detected by Cristiano Passerini from Pontecchio Marconi) again with the hypergeometric distribution  $\mathcal{H}(N;n;p)$! The ratio should be
$\dfrac{{n_1\choose n_{11}} {n-n_1\choose n_2-n_{11}}\big/ {n\choose n_2}\pi(N=n)}{\sum_{k=36}^{50} {n_1\choose n_{11}} {k-n_1\choose n_2-n_{11}}\big/ {k\choose n_2}\pi(N=k)}$