## genuine Latinsquare rectangle

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on June 9, 2021 by xi'an

## Simulating a coin with irrational bias using rational arithmetic

Posted in Books, Statistics with tags , , , , , , on December 17, 2020 by xi'an An arXived paper by Luis Mendo adresses the issue of simulating coins with irrational probabilities from a fair coin, somehow connected with one of the latest riddles. Where I realised only irrational coins could be simulated in a fixed and finite number of throws! The setting of the paper is however similar to the one of a Bernoulli factory in that an unlimited number of coins can be generated (but it relies on a  fair coin). And the starting point is a series representation of the irrational ζ as a sum of positive and rational terms. As well as an earlier paper by the Warwick team of Łatuszyński et al. (2011). The solution is somewhat anticlimactic in that the successive draws of the fair coin lead to a sequence of intervals with length divided by 2 at each step. And stopping when a certain condition is met, requiring some knowledge on the tail error of the series. The paper shows further that the number of inputs used by its algorithm has an exponential tail. The examples provided therein are Euler’s constant $\gamma =\frac{1}{2} + \sum_{i=1}^\infty \frac{B(i)}{2i(2i+1)(2i+2)}$

where B(j) is the number of binary digits of j, and π/4 which can be written as an alternating series. An idle question that came to me while reading this paper is the influence of the series chosen to represent the irrational ζ as it seems that a faster decrease in the series should lead to fewer terms being used. However, the number of iterations is a geometric random variable with parameter 1/2, therefore the choice of the series curiously does not matter.

## Riddler collector

Posted in Statistics with tags , , , , , , , on September 22, 2018 by xi'an Once in a while a fairly standard problem makes it to the Riddler puzzle of the week. Today, it is the coupon collector problem, explained by W. Huber on X validated. (W. Huber happens to be the top contributor to this forum, with over 2000 answers, and the highest reputation closing on 200,000!) With nothing (apparently) unusual: coupons [e.g., collecting cards] come in packs of k=10 with no duplicate, and there are n=100 different coupons. What is the expected number one has to collect before getting all of the n coupons?  W. Huber provides an R code to solve the recurrence on the expectation, obtained by conditioning on the number m of different coupons already collected, e(m,n,k) and hence on the remaining number of collect, with an Hypergeometric distribution for the number of new coupons in the next pack. Returning 25.23 packs on average. As is well-known, the average number of packs to complete one’s collection with the final missing card is expensively large, with more than 5 packs necessary on average. The probability distribution of the required number of packs has actually been computed by Laplace in 1774 (and then again by Euler in 1785). ## a funny mistake

Posted in Statistics with tags , , , , , , , , , , , on August 20, 2018 by xi'an While watching the early morning activity in Tofino inlet from my rental desk, I was looking at a recent fivethirthyeight Riddle, which consisted in finding the probability of stopping a coin game which rule was to wait for the n consecutive heads if (n-1) consecutive heads had failed to happen when requested, which is

p+(1-p)p²+(1-p)(1-p²)p³+…

or $q=\sum_{k=1}^\infty p^k \prod_{j=1}^{k-1}(1-p^j)$

While the above can write as $q=\sum_{k=1}^\infty \{1-(1-p^k)\} \prod_{j=1}^{k-1}(1-p^j)$

or $\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j)-\prod_{j=1}^{k}(1-p^j)$

hence suggesting $q=\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j) - \sum_{k=2}^\infty \prod_{j=1}^{k-1}(1-p^j) =1$

the answer is (obviously) false and the mistake in separating the series into a difference of series is that both terms are infinite. The correct answer is actually $q=1-\prod_{j=1}^{\infty}(1-p^j)$

which is Euler’s function. Maybe nonstandard analysis can apply to go directly from the difference of the infinite series to the answer!

## flea circus

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , , , on December 8, 2016 by xi'an An old riddle found on X validated asking for Monte Carlo resolution  but originally given on Project Euler:

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

xprmt=function(n=10,T=50){

mean=0
for (t in 1:n){

board=rep(1,900)
for (v in 1:T){

beard=rep(0,900)
if (board>0){
poz=c(0,1,0,30)
ne=rmultinom(1,board,prob=(poz!=0))
beard[1+poz]=beard[1+poz]+ne}
#
for (i in (2:899)[board[-1][-899]>0]){
u=(i-1)%%30+1;v=(i-1)%/%30+1
poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30))
ne=rmultinom(1,board[i],prob=(poz!=0))
beard[i+poz]=beard[i+poz]+ne}
#
if (board>0){
poz=c(-1,0,-30,0)
ne=rmultinom(1,board,prob=(poz!=0))
beard[900+poz]=beard[900+poz]+ne}
board=beard}
mean=mean+sum(board==0)}
return(mean/n)}


The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

> exprmt(n=1e3)
 331.082
> 900/exp(1)
 331.0915


Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has no stationary distribution, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

inva=function(B=30){
return(c(2,rep(3,B-2),2,rep(c(3,
rep(4,B-2),3),B-2),2,rep(3,B-2),2))}


namely

> mn=0;n=1e8 #14 clock hours!
> proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30)
> for (t in 1:n)
+ mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4]
> mn=mn/n
> mn=mn-450
> mn
0      1      2     3
166.11 164.92  82.56 27.71
> exprmt(n=1e6) #55 clock hours!!
 165.36 165.69 82.92 27.57

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)