## sampling w/o replacement except when replacing

Posted in Books, Kids, R with tags , , , , , , , on November 3, 2020 by xi'an

Another Riddle(r), considering a box with M myrtle balls and D dandelion balls. Drawing balls without replacement while they stay of the same color as the initial draw, else put back the last ball and repeat the process until all balls are drawn. The funny thing is that, unless M=0 or D=0, the probability to draw a myrtle ball at the end is always ½..! This can be easily checked by simulation (when M=2 and D=8)

r=function()sample(0:1,1,p=c(d,m))
for(t in 1:1e6){
m=2;d=8
i=r();m=m-!!i;d=d-!i
while(!!m*d){
j=r();i=ifelse(i==j,j,r())
m=m-!!i;d=d-!i}
F=F+(m>0)}
F/1e6

Now the proof that the probability is ½ is quite straightforward, for M=1 (or D=1). But I cannot find a quick fix for larger values. I thus reasoned by recursion, with the probability of emptying a given colour first is d!m!/(d+m)!, whatever the colour and whatever d>0,m>0. Hence half a chance to finish with myrtle. Any shorter sequence of a given colour reduces the value of either d or m, at which point we are using the recursion assumption that the probability is ½…

## Riddle of the lanes

Posted in Books, Kids, R with tags , , , , , on July 13, 2020 by xi'an

An express riddle from the Riddler about reopening pools, where lanes are allowed provided there is no swimmer in the lane or in any of the adjacent lanes. If swimmers pick their lane at random (while they can), what is the average number of occupied lanes?

If there are n lanes and E(n) is the expected number of swimmers, E(n) satisfies a recurrence relation determined by the location of the first swimmer:

$E(n)=1+\frac{1}{n}[2E(n-2)+\sum_{i=2}^{n-1}\{E(i-2)+E(n-i-1)\}]$

with E(0)=0, E(1)=E(2)=1. The above can be checked with a quick R experiment:

en=0
for(t in 1:T){
la=rep(u<-0,N)
while(sum(la)<N){
i=sample(rep((1:N)[!la],2),1)
la[max(1,i-1):min(N,i+1)]=1
u=u+1}
en=en+u}

## the large half now

Posted in R, Statistics with tags , , , on October 28, 2012 by xi'an

The little half puzzle proposed a “dumb’ solution in that players play a minimax strategy. There are 34 starting values less than 100 guaranteeing a sure win to dumb players. If instead the players maximise their choice at each step, the R code looks like this:

solveO=function(n){
if (n&lt;3){ solve=(n==2)}else{
solve=(!(solveO(n-1)))||(!solveO(ceiling(n/2)))}
solve}

and there are now 66 (=100-34, indeed!) starting values for which the starting player can win.

Incidentally, I typed

&gt; solveO(1113)
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

which shows R cannot handle heavy recursion without further programming. Testing for the upper limit, I found that the largest acceptable value is 555 (which takes forever to return a value, predicted at more than one hour by a linear regression on the run times till 300…).

## Cross validated question

Posted in R, Statistics, University life with tags , , , on February 20, 2012 by xi'an

Another problem generated by X’validated (on which I spent much too much time!): given an unbiased coin that produced M heads in the first M tosses, what is the expected number of additional tosses needed to get N (N>M) consecutive heads?

## ultimate R recursion

Posted in Books, R, Statistics, University life with tags , , , , , , on January 31, 2012 by xi'an

One of my students wrote the following code for his R exam, trying to do accept-reject simulation (of a Rayleigh distribution) and constant approximation at the same time:

fAR1=function(n){
u=runif(n)
x=rexp(n)
f=(C*(x)*exp(-2*x^2/3))
g=dexp(n,1)
test=(u<f/(3*g))
y=x[test]
p=length(y)/n #acceptance probability
M=1/p
C=M/3
hist(y,20,freq=FALSE)
return(x)
}

which I find remarkable if alas doomed to fail! I wonder if there exists a (real as opposed to fantasy) computer language where you could introduce constants C and only define them later… (What’s rather sad is that I keep insisting on the fact that accept-reject does not need the constant C to operate. And that I found the same mistake in several of the students’ code. There is a further mistake in the above code when defining g. I also wonder where the 3 came from…)