## Riddler collector

Posted in Statistics with tags , , , , , , , on September 22, 2018 by xi'an

Once in a while a fairly standard problem makes it to the Riddler puzzle of the week. Today, it is the coupon collector problem, explained by W. Huber on X validated. (W. Huber happens to be the top contributor to this forum, with over 2000 answers, and the highest reputation closing on 200,000!) With nothing (apparently) unusual: coupons [e.g., collecting cards] come in packs of k=10 with no duplicate, and there are n=100 different coupons. What is the expected number one has to collect before getting all of the n coupons?  W. Huber provides an R code to solve the recurrence on the expectation, obtained by conditioning on the number m of different coupons already collected, e(m,n,k) and hence on the remaining number of collect, with an Hypergeometric distribution for the number of new coupons in the next pack. Returning 25.23 packs on average. As is well-known, the average number of packs to complete one’s collection with the final missing card is expensively large, with more than 5 packs necessary on average. The probability distribution of the required number of packs has actually been computed by Laplace in 1774 (and then again by Euler in 1785).

## riddles on a line [#2]

Posted in Books, Kids, R with tags , , , , , , , on September 11, 2018 by xi'an

A second Riddle(r), with a puzzle related with the integer set Ð={,12,3,…,N}, in that it summarises as

Given a random walk on Ð, starting at the middle N/2, with both end states being absorbing states, and a uniform random move left or right of the current value to the (integer) middle of the corresponding (left or right) integer interval, what is the average time to one absorbing state as a function of N?

Once the Markov transition matrix M associated with this random walk is defined, the probability of reaching an absorbing state in t steps can be derived from the successive powers of M by looking at the difference between the probabilities to be (already) absorbed at both t-1 and t steps. From which the average can be derived.

avexit <- function(N=100){
#transition matrix M for the walk
#1 and N+2 are trapping states
tranz=matrix(0,N+2,N+2)
tranz[1,1]=tranz[N+2,N+2]=1
for (i in 2:(N+1))
tranz[i,i+max(trunc((N+1-i)/2),1)]=tranz[i,i-max(trunc((i-2)/2),1)]=1/2
#probabilities of absorption
prowin=proloz=as.vector(0)
init=rep(0,N+2)
init[trunc((N+1)/2)]=1 #first position
curt=init
while(1-prowin[length(prowin)]-proloz[length(prowin)]>1e-10){
curt=curt%*%tranz
prowin=c(prowin,curt[1])
proloz=c(proloz,curt[N+2])}
#probability of new arrival in trapping state
probz=diff(prowin+proloz)
return(sum((2:length(proloz))*probz))}


leading to an almost linear connection between N and expected trapping time.

## riddles on a line

Posted in Books, Kids, R with tags , , , , , , , , , on August 22, 2018 by xi'an

In the Riddler of August 18, two riddles connected with the integer set Ð={2,3,…,10}:

1. Given a permutation of Ð, what is the probability that the most likely variation (up or down) occurs at each term?
2. Given three players choosing three different integers in Ð sequentially, and rewards in Ð allocated to the closest of the three players (with splits in case of equal distance), what is the reward for each given an optimal strategy?

For the first question, a simple code returns 0.17…

winofail<-function(permz){
if (length(permz)>1){
lenoperm=length(permz[-1])/2
win=(permz[1]<permz[2])*(sum(permz[-1]>permz[1])>lenoperm)+
(permz[1]>permz[2])*(sum(permz[-1]<permz[1])>lenoperm)+
(runif(1)<.5)*(sum(permz[-1]>permz[1])==lenoperm)
win=win&&winofail(permz[-1])
}else win=TRUE
return(win)}


(but the analytic solution exhibits a cool Pascal triangular relation!) and for the second question, a quick recursion or dynamic programming produces 20, 19, 15 as the rewards (and 5, 9, 8 as the locations)

gainz<-function(seqz){
difz=t(abs(outer(2:10,seqz,"-")))
cloz=apply(difz,2,rank)
return(apply(rbind(2:10,2:10,2:10)*
((cloz==1)+.5*(cloz==1.5)),1,sum))}

timeline<-function(prev){
if (length(prev)==0){
sol=timeline(10);bez=gainz(sol)[1]
for (i in 2:9){
bol=timeline(i);comp=gainz(bol)[1]
if (comp>bez){
bez=comp;sol=bol}}}
if (length(prev)==1){
bez=-10
for (i in (2:10)[-(prev-1)]){
bol=timeline(c(prev,i))
comp=gainz(bol)[2]
if (comp>bez){
bez=comp;sol=bol}}}
if (length(prev)==2){
bez=-10
for (i in (2:10)[-(prev-1)]){
bol=c(prev,i)
comp=gainz(bol)[3]
if (comp>bez){
bez=comp;sol=bol}}}
return(sol)}


After reading the solution on the Riddler, I realised I had misunderstood the line as starting at 2 when it was actually starting from 1. Correcting for this leads to the same 5, 9, 8 locations of picks, with rewards of 21, 19, 15.

## a funny mistake

Posted in Statistics with tags , , , , , , , , , , , on August 20, 2018 by xi'an

While watching the early morning activity in Tofino inlet from my rental desk, I was looking at a recent fivethirthyeight Riddle, which consisted in finding the probability of stopping a coin game which rule was to wait for the n consecutive heads if (n-1) consecutive heads had failed to happen when requested, which is

p+(1-p)p²+(1-p)(1-p²)p³+…

or

$q=\sum_{k=1}^\infty p^k \prod_{j=1}^{k-1}(1-p^j)$

While the above can write as

$q=\sum_{k=1}^\infty \{1-(1-p^k)\} \prod_{j=1}^{k-1}(1-p^j)$

or

$\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j)-\prod_{j=1}^{k}(1-p^j)$

hence suggesting

$q=\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j) - \sum_{k=2}^\infty \prod_{j=1}^{k-1}(1-p^j) =1$

the answer is (obviously) false and the mistake in separating the series into a difference of series is that both terms are infinite. The correct answer is actually

$q=1-\prod_{j=1}^{\infty}(1-p^j)$

which is Euler’s function. Maybe nonstandard analysis can apply to go directly from the difference of the infinite series to the answer!

## a thread to bin them all [puzzle]

Posted in Books, Kids, R, Travel with tags , , , , , , , , on July 9, 2018 by xi'an

The most recent riddle on the Riddler consists in finding the shorter sequence of digits (in 0,1,..,9) such that all 10⁴ numbers between 0 (or 0000) and 9,999 can be found as a group of consecutive four digits. This sequence is obviously longer than 10⁴+3, but how long? On my trip to Brittany last weekend, I wrote an R code first constructing the sequence at random by picking with high preference the next digit among those producing a new four-digit number

tenz=10^(0:3)
wn2dg=function(dz) 1+sum(dz*tenz)

seqz=rep(0,10^4)
snak=wndz=sample(0:9,4,rep=TRUE)
seqz[wn2dg(wndz)]=1
while (min(seqz)==0){
wndz[1:3]=wndz[-1];wndz[4]=0
wndz[4]=sample(0:9,1,prob=.01+.99*(seqz[wn2dg(wndz)+0:9]==0))
snak=c(snak,wndz[4])
sek=wn2dg(wndz)
seqz[sek]=seqz[sek]+1}


which usually returns a value above 75,000. I then looked through the sequence to eliminate useless replicas

for (i in sample(4:(length(snak)-5))){
if ((seqz[wn2dg(snak[(i-3):i])]>1)
&(seqz[wn2dg(snak[(i-2):(i+1)])]>1)
&(seqz[wn2dg(snak[(i-1):(i+2)])]>1)
&(seqz[wn2dg(snak[i:(i+3)])]>1)){
seqz[wn2dg(snak[(i-3):i])]=seqz[wn2dg(snak[(i-3):i])]-1
seqz[wn2dg(snak[(i-2):(i+1)])]=seqz[wn2dg(snak[(i-2):(i+1)])]-1
seqz[wn2dg(snak[(i-1):(i+2)])]=seqz[wn2dg(snak[(i-1):(i+2)])]-1
seqz[wn2dg(snak[i:(i+3)])]=seqz[wn2dg(snak[i:(i+3)])]-1
snak=snak[-i]
seqz[wn2dg(snak[(i-3):i])]=seqz[wn2dg(snak[(i-3):i])]+1
seqz[wn2dg(snak[(i-2):(i+1)])]=seqz[wn2dg(snak[(i-2):(i+1)])]+1
seqz[wn2dg(snak[(i-1):(i+2)])]=seqz[wn2dg(snak[(i-1):(i+2)])]+1}}


until none is found. A first attempt produced 12,911 terms in the sequence. A second one 12,913. A third one 12,871. Rather consistent figures but not concentrated enough to believe in achieving a true minimum. An overnight run produced 12,779 as the lowest value. Checking the answer the week after, it appears that 10⁴+3 is the correct answer!

## seven digit addition

Posted in Kids, R with tags , , , on July 6, 2018 by xi'an

Another quick riddle from the riddler: solve the equation

EXMREEK + EHKREKK = ?K?H?X?E

which involves every digit between 0 and 9. While the puzzle can be unravelled by considering first E and K, which must be equal to 6 and 3, a simple R code also leads to the conclusion

isok <- function(a,b){
s=as.numeric(unlist(strsplit(as.character(sum(10^(6:0)*a)+
sum(10^(6:0)*b)),"")))
if (length(s)==7){ goal=FALSE}else{
goal=(length(unique(c(a,b,s)))==10)&(a[2]==s[6])&
(s[8]==a[6])&(s[2]==a[7])&(b[2]==s[4])}
return(goal)}

pasok <- function(T=1e3){
for (t in 1:T){
a[1]=a[5]=a[6]=6;a[7]=3
digs=sample(c(0:2,4,5,7:9),4)
a[2:4]=digs[1:3] b[1]=a[1];b[2]=digs[4];
b[3]=a[7];b[4]=a[4];b[5]=a[1];b[6:7]=a[7]
if (isok(a=a,b=b))
print(rbind(a,b))}}

> pasok()
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
a    6    2    4    7    6    6    3
b    6    8    3    7    6    3    3


which sum is 13085296.

## a chain of collapses

Posted in Books, Kids, R with tags , , , on June 20, 2018 by xi'an

A quick riddler resolution during a committee meeting (!) of a short riddle: 36 houses stand in a row and collapse at times t=1,2,..,36. In addition, once a house collapses, the neighbours if still standing collapse at the next time unit. What are the shortest and longest lifespans of this row?

Since a house with index i would collapse on its own by time i, the longest lifespan is 36, which can be achieved with the extra rule when the collapsing times are perfectly ordered. For the shortest lifespan, I ran a short R code implementing the rules and monitoring its minimum. Which found 7 as the minimal number for 10⁵ draws. However, with an optimal ordering, one house plus one or two neighbours of the most recently collapsed, leading to a maximal number of collapsed houses after k time units being

1+2(k-1)+1+2(k-2)+….=k+k(k-1)=k²

which happens to be equal to 36 for k=6. (Which was also obtained in 10⁶ draws!) This also gives the solution for any value of k.