## a thread to bin them all [puzzle]

Posted in Books, Kids, R, Travel with tags , , , , , , , , on July 9, 2018 by xi'an

The most recent riddle on the Riddler consists in finding the shorter sequence of digits (in 0,1,..,9) such that all 10⁴ numbers between 0 (or 0000) and 9,999 can be found as a group of consecutive four digits. This sequence is obviously longer than 10⁴+3, but how long? On my trip to Brittany last weekend, I wrote an R code first constructing the sequence at random by picking with high preference the next digit among those producing a new four-digit number

tenz=10^(0:3)
wn2dg=function(dz) 1+sum(dz*tenz)

seqz=rep(0,10^4)
snak=wndz=sample(0:9,4,rep=TRUE)
seqz[wn2dg(wndz)]=1
while (min(seqz)==0){
wndz[1:3]=wndz[-1];wndz[4]=0
wndz[4]=sample(0:9,1,prob=.01+.99*(seqz[wn2dg(wndz)+0:9]==0))
snak=c(snak,wndz[4])
sek=wn2dg(wndz)
seqz[sek]=seqz[sek]+1}


which usually returns a value above 75,000. I then looked through the sequence to eliminate useless replicas

for (i in sample(4:(length(snak)-5))){
if ((seqz[wn2dg(snak[(i-3):i])]>1)
&(seqz[wn2dg(snak[(i-2):(i+1)])]>1)
&(seqz[wn2dg(snak[(i-1):(i+2)])]>1)
&(seqz[wn2dg(snak[i:(i+3)])]>1)){
seqz[wn2dg(snak[(i-3):i])]=seqz[wn2dg(snak[(i-3):i])]-1
seqz[wn2dg(snak[(i-2):(i+1)])]=seqz[wn2dg(snak[(i-2):(i+1)])]-1
seqz[wn2dg(snak[(i-1):(i+2)])]=seqz[wn2dg(snak[(i-1):(i+2)])]-1
seqz[wn2dg(snak[i:(i+3)])]=seqz[wn2dg(snak[i:(i+3)])]-1
snak=snak[-i]
seqz[wn2dg(snak[(i-3):i])]=seqz[wn2dg(snak[(i-3):i])]+1
seqz[wn2dg(snak[(i-2):(i+1)])]=seqz[wn2dg(snak[(i-2):(i+1)])]+1
seqz[wn2dg(snak[(i-1):(i+2)])]=seqz[wn2dg(snak[(i-1):(i+2)])]+1}}


until none is found. A first attempt produced 12,911 terms in the sequence. A second one 12,913. A third one 12,871. Rather consistent figures but not concentrated enough to believe in achieving a true minimum. An overnight run produced 12,779 as the lowest value. Checking the answer the week after, it appears that 10⁴+3 is the correct answer!

## seven digit addition

Posted in Kids, R with tags , , , on July 6, 2018 by xi'an

Another quick riddle from the riddler: solve the equation

EXMREEK + EHKREKK = ?K?H?X?E

which involves every digit between 0 and 9. While the puzzle can be unravelled by considering first E and K, which must be equal to 6 and 3, a simple R code also leads to the conclusion

isok <- function(a,b){
s=as.numeric(unlist(strsplit(as.character(sum(10^(6:0)*a)+
sum(10^(6:0)*b)),"")))
if (length(s)==7){ goal=FALSE}else{
goal=(length(unique(c(a,b,s)))==10)&(a[2]==s[6])&
(s[8]==a[6])&(s[2]==a[7])&(b[2]==s[4])}
return(goal)}

pasok <- function(T=1e3){
for (t in 1:T){
a[1]=a[5]=a[6]=6;a[7]=3
digs=sample(c(0:2,4,5,7:9),4)
a[2:4]=digs[1:3] b[1]=a[1];b[2]=digs[4];
b[3]=a[7];b[4]=a[4];b[5]=a[1];b[6:7]=a[7]
if (isok(a=a,b=b))
print(rbind(a,b))}}

> pasok()
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
a    6    2    4    7    6    6    3
b    6    8    3    7    6    3    3


which sum is 13085296.

## a chain of collapses

Posted in Books, Kids, R with tags , , , on June 20, 2018 by xi'an

A quick riddler resolution during a committee meeting (!) of a short riddle: 36 houses stand in a row and collapse at times t=1,2,..,36. In addition, once a house collapses, the neighbours if still standing collapse at the next time unit. What are the shortest and longest lifespans of this row?

Since a house with index i would collapse on its own by time i, the longest lifespan is 36, which can be achieved with the extra rule when the collapsing times are perfectly ordered. For the shortest lifespan, I ran a short R code implementing the rules and monitoring its minimum. Which found 7 as the minimal number for 10⁵ draws. However, with an optimal ordering, one house plus one or two neighbours of the most recently collapsed, leading to a maximal number of collapsed houses after k time units being

1+2(k-1)+1+2(k-2)+….=k+k(k-1)=k²

which happens to be equal to 36 for k=6. (Which was also obtained in 10⁶ draws!) This also gives the solution for any value of k.

## maximal spacing around order statistics [#2]

Posted in Books, R, Statistics, University life with tags , , , , , , , , on June 8, 2018 by xi'an

The proposed solution of the riddle from the Riddler discussed here a few weeks ago is rather approximative, in that the distribution of

$\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|$

when the n-sample is made of iid Normal variates is (a) replaced with the distribution of one arbitrary minimum and (b) the distribution of the minimum is based on an assumption of independence between the absolute differences. Which does not hold, as shown by the above correlation matrix (plotted via corrplot) for N=11 and 10⁴ simulations. One could think that this correlation decreases with N, but it remains essentially 0.2 for larger values of N. (On the other hand, the minima are essentially independent.)

## maximal spacing around order statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 17, 2018 by xi'an

The riddle from the Riddler for the coming weeks is extremely simple to express in mathematical terms, as it summarises into characterising the distribution of

$\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|$

when the n-sample is made of iid Normal variates. I however had a hard time finding a result connected with this quantity since most available characterisations are for either Uniform or Exponential variates. I eventually found a 2017 arXival by Nagaraya et al.  covering the issue. Since the Normal distribution belongs to the Gumbel domain of attraction, the extreme spacings, that is the spacings between the most extreme orders statistics [rescaled by nφ(Φ⁻¹{1-n⁻¹})] are asymptotically independent and asymptotically distributed as (Theorem 5, p.15, after correcting a typo):

$(\xi_1,\xi_2/2,...)$

where the ξ’s are Exp(1) variates. A crude approximation is thus to consider that the above Δ is distributed as the maximum of two standard and independent exponential distributions, modulo the rescaling by  nφ(Φ⁻¹{1-n⁻¹})… But a more adequate result was pointed out to me by Gérard Biau, namely a 1986 Annals of Probability paper by Paul Deheuvels, my former head at ISUP, Université Pierre and Marie Curie. In this paper, Paul Deheuvels establishes that the largest spacing in a normal sample, M¹, satisfies

$\mathbb{P}(\sqrt{2\log\,n}\,M^1\le x) \to \prod_{i=1}^{\infty} (1-e^{-ix})^2$

from which a conservative upper bound on the value of n required for a given bound x⁰ can be derived. The simulation below compares the limiting cdf (in red) with the empirical cdf of the above Δ based on 10⁴ samples of size n=10³.The limiting cdf is the cdf of the maximum of an infinite sequence of independent exponentials with scales 1,½,…. Which connects with the above result, in fine. For a practical application, the 99% quantile of this distribution is 4.71. To achieve a maximum spacing of, say 0.1, with probability 0.99, one would need 2 log(n) > 5.29²/0.1², i.e., log(n)>1402, which is a pretty large number…

## the riddle of the stands

Posted in Books, Kids, R with tags , , , , , on May 11, 2018 by xi'an

The simple riddle of last week on The Riddler, about the minimum number of urinals needed for n men to pee if the occupation rule is to stay as far as possible from anyone there and never to stand next to another man,  is quickly solved by an R code:

ocupee=function(M){
ok=rep(0,M)
ok[1]=ok[M]=1
ok[trunc((1+M/2))]=1
while (max(diff((1:M)[ok!=0])>2)){
i=order(-diff((1:M)[ok!=0]))[1]
ok[(1:M)[ok!=0][i]+trunc((diff((1:M)[ok!=0])[i]/2))]=1
}
return(sum(ok>0))
}


with maximal occupation illustrated by the graph below:

Meaning that the efficiency of the positioning scheme is not optimal when following the sequential positioning, requiring $N+2^{\lceil log_2(N-1) \rceil}$ urinals. Rather than one out of two, requiring 2N-1 urinals. What is most funny in this simple exercise is the connection exposed in the Riddler with an Xkcd blag written a few years go about the topic.

## linear Diophantine equations

Posted in Statistics with tags , , , , , , on May 10, 2018 by xi'an

When re-expressed in maths terms, the current Riddler is about finding a sequence x⁰,x¹,…,x⁷ of integers such that

x⁰=7x¹+1
6x¹=7x²+1

6x⁶=7x⁷+1
6x⁷=7x⁸

which turns into a linear equation with integer valued solutions, or a system of linear Diophantine equation. Which can be easily solved by brute-force R coding:

A=matrix(0,7,7)
for (i in 1:7) A[i,i]=6
for (i in 1:6) A[i,i+1]=-7
for (x in 1:1e6){
zol=solve(a=A,b=c(rep(1,6),7*x))
if (max(abs(zol-round(zol)))<1e-3) print(x)}
x=39990 #x8=5.6.31.43
7*solve(a=A,b=c(rep(1,6),7*x))[1]+1 #x0


which produces x⁰=823537. But it would be nicer to directly solve the linear system under the constraint. For instance, the inverse of the matrix A above is an upper triangular matrix with (upper-)diagonals

1/6, 7/6², 7²/6³,…,7⁶/6⁷

but this does not help considerably, except for x⁸ to be solutions to 7 equations involving powers of 6 and 7… This system of equations can be solved by successive substitutions but this still feels very pedestrian!