## Riddler collector

Posted in Statistics with tags , , , , , , , on September 22, 2018 by xi'an Once in a while a fairly standard problem makes it to the Riddler puzzle of the week. Today, it is the coupon collector problem, explained by W. Huber on X validated. (W. Huber happens to be the top contributor to this forum, with over 2000 answers, and the highest reputation closing on 200,000!) With nothing (apparently) unusual: coupons [e.g., collecting cards] come in packs of k=10 with no duplicate, and there are n=100 different coupons. What is the expected number one has to collect before getting all of the n coupons?  W. Huber provides an R code to solve the recurrence on the expectation, obtained by conditioning on the number m of different coupons already collected, e(m,n,k) and hence on the remaining number of collect, with an Hypergeometric distribution for the number of new coupons in the next pack. Returning 25.23 packs on average. As is well-known, the average number of packs to complete one’s collection with the final missing card is expensively large, with more than 5 packs necessary on average. The probability distribution of the required number of packs has actually been computed by Laplace in 1774 (and then again by Euler in 1785). ## a funny mistake

Posted in Statistics with tags , , , , , , , , , , , on August 20, 2018 by xi'an While watching the early morning activity in Tofino inlet from my rental desk, I was looking at a recent fivethirthyeight Riddle, which consisted in finding the probability of stopping a coin game which rule was to wait for the n consecutive heads if (n-1) consecutive heads had failed to happen when requested, which is

p+(1-p)p²+(1-p)(1-p²)p³+…

or $q=\sum_{k=1}^\infty p^k \prod_{j=1}^{k-1}(1-p^j)$

While the above can write as $q=\sum_{k=1}^\infty \{1-(1-p^k)\} \prod_{j=1}^{k-1}(1-p^j)$

or $\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j)-\prod_{j=1}^{k}(1-p^j)$

hence suggesting $q=\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j) - \sum_{k=2}^\infty \prod_{j=1}^{k-1}(1-p^j) =1$

the answer is (obviously) false and the mistake in separating the series into a difference of series is that both terms are infinite. The correct answer is actually $q=1-\prod_{j=1}^{\infty}(1-p^j)$

which is Euler’s function. Maybe nonstandard analysis can apply to go directly from the difference of the infinite series to the answer!

## flea circus

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , , , on December 8, 2016 by xi'an An old riddle found on X validated asking for Monte Carlo resolution  but originally given on Project Euler:

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

xprmt=function(n=10,T=50){

mean=0
for (t in 1:n){

board=rep(1,900)
for (v in 1:T){

beard=rep(0,900)
if (board>0){
poz=c(0,1,0,30)
ne=rmultinom(1,board,prob=(poz!=0))
beard[1+poz]=beard[1+poz]+ne}
#
for (i in (2:899)[board[-1][-899]>0]){
u=(i-1)%%30+1;v=(i-1)%/%30+1
poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30))
ne=rmultinom(1,board[i],prob=(poz!=0))
beard[i+poz]=beard[i+poz]+ne}
#
if (board>0){
poz=c(-1,0,-30,0)
ne=rmultinom(1,board,prob=(poz!=0))
beard[900+poz]=beard[900+poz]+ne}
board=beard}
mean=mean+sum(board==0)}
return(mean/n)}


The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

> exprmt(n=1e3)
 331.082
> 900/exp(1)
 331.0915


Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has no stationary distribution, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

inva=function(B=30){
return(c(2,rep(3,B-2),2,rep(c(3,
rep(4,B-2),3),B-2),2,rep(3,B-2),2))}


namely

> mn=0;n=1e8 #14 clock hours!
> proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30)
> for (t in 1:n)
+ mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4]
> mn=mn/n
> mn=mn-450
> mn
0      1      2     3
166.11 164.92  82.56 27.71
> exprmt(n=1e6) #55 clock hours!!
 165.36 165.69 82.92 27.57

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)

## Laplace great⁶-grand child!

Posted in Kids, pictures, Statistics, University life with tags , , , , , , , , , on August 3, 2015 by xi'an  Looking at the Family Tree application (I discovered via Peter Coles’ blog), I just found out that I was Laplace’s [academic] great-great-great-great-great-great-great-grand-child! Through Poisson and Chasles. Going even further, as Simeon Poisson was also advised by Lagrange, my academic lineage reaches Euler and the Bernoullis. Pushing always further, I even found William of Ockham along one of the “direct” branches! Amazing ancestry, to which my own deeds pay little homage if any… (However, I somewhat doubt the strength of the links for the older names, since pursuing them ends up at John the Baptist!)

I wonder how many other academic descendants of Laplace are alive today. Too bad Family Tree does not seem to offer this option! Given the longevity of both Laplace and Poisson, they presumably taught many students, which means a lot of my colleagues and even of my Bayesian colleagues should share the same illustrious ancestry. For instance, I share part of this ancestry with Gérard Letac. And both Jean-Michel Marin and Arnaud Guillin. Actually, checking with the Mathematics Genealogy Project, I see that Laplace had… one student!, but still a grand total of [at least] 85,738 descendants… Incidentally, looking at the direct line, most of those had very few [recorded] descendants.