## survivalists [a Riddler’s riddle]

Posted in Books, Kids, R, Statistics with tags , , , , , , on April 22, 2019 by xi'an

A neat question from The Riddler on a multi-probability survival rate:

Nine processes are running in a loop with fixed survivals rates .99,….,.91. What is the probability that the first process is the last one to die? Same question with probabilities .91,…,.99 and the probability that the last process is the last one to die.

The first question means that the realisation of a Geometric G(.99) has to be strictly larger than the largest of eight Geometric G(.98),…,G(.91). Given that the cdf of a Geometric G(a) is [when counting the number of attempts till failure, included, i.e. the Geometric with support the positive integers]

$F(x)=\Bbb P(X\le x)=1-a^{x}$

the probability that this happens has the nice (?!) representation

$\sum_{x=2}^\infty a_1^{x-1}(1-a_1)\prod_{j\ge 2}(1-a_j^{x-1})=(1-a_1)G(a_1,\ldots,a_9)$

which leads to an easy resolution by recursion since

$G(a_1,\ldots,a_9)=G(a_1,\ldots,a_8)-G(a_1a_9,\ldots,a_8)$

and $G(a)=a/(1-a)$

and a value of 0.5207 returned by R (Monte Carlo evaluation of 0.5207 based on 10⁷ replications). The second question is quite similar, with solution

$\sum_{x=2}^\infty a_1^{x-1}(1-a_1)\prod_{j\ge 1}(1-a_j^{x})=a^{-1}(1-a_1)G(a_1,\ldots,a_9)$

and value 0.52596 (Monte Carlo evaluation of 0.52581 based on 10⁷ replications).

## numbers of numbers with numbers of their numbers

Posted in Statistics with tags , , , on April 6, 2019 by xi'an

A funny riddle from The Riddler where the number of numbers that indicate the numbers of their numbers is requested. By which one means numbers like 22 (two 2’s) and 21322314 (two 1’s, three 2’s, two 3’s and one 4), the convention being that “numbers consist of alternating tallies and numerals”. And that numerals are “tallied in increasing order”. A reasoning based on the number of 1’s, from zero (where 22 seems to be the only possibility) to six (where 613223141516171819 is the only case) leads to a total of 59 cases (unless zero counts as an extra case) and a brute force R exploration returns the same figure:

for (t in 1:1e5){
az=sample((0:6),9,rep=TRUE)
count=rep(TRUE,9)
for (i in 1:9) count[i]=(sum(az==i)+(az[i]>0))==az[i]
nit=0
while ((min(count)==0)&(nit<1e2)){
j=sample((1:9)[!count],1)
az[j]=(sum(az==j)+(az[j]>0))
nit=nit+1}
if (min(count)==1) solz=unique.matrix(rbind(solz,az),mar=1)}


The solution published on The Riddler differs because it also includes numbers with zeros. Which I find annoying to the extreme because if zeroes are allowed then every digit not in the original solution should appear as multiplied by 0, which is self-contradictory… For instance 22 should be 012203…09, except then there is one 1, one 3, and so on.

## no country for old liars

Posted in Kids, R with tags , , , , , on March 30, 2019 by xi'an

A puzzle from the Riddler about a group of five persons, A,..,E, where all and only people strictly older than L are liars, all making statements about others’ ages:

1. A: B>20 and D>16
2. B: C>18 and E<20
3. C: D<22 and A=19
4. D: E≠20 and B=20
5. E: A>21 and C<18

The Riddler is asking for the (integer value of L and the ranges or values of A,…,E. After thinking about this puzzle over a swimming session, I coded the (honest) constraints and their (liar) complements as many binary matrices, limiting the number of values of L to 8 from 0 (15) to 7 (22) and A,…,E to 7 from 1 (16) to 7 (22):

CA=CB=CC=CD=CE=A=B=C=D=E=matrix(1,5,7)
#constraints
A[2,1:(20-15)]=A[4,1]=0 #A honest
CA[2,(21-15):7]=CA[4,2:7]=0 #A lying
B[3,1:(18-15)]=B[5,(20-15):7]=0
CB[3,(19-15):7]=CB[5,1:(19-15)]=0
C[1,-(19-15)]=C[4,7]=0 #C honest
CC[1,(19-15)]=CC[4,-7]=0 #C lying
D[5,(17-15)]=D[2,-(20-15)]=0
CD[5,-(17-15)]=CD[2,(20-15)]=0
E[1,1:(21-15)]=E[3,(18-15):7]=0
CE[1,7]=CE[3,1:(17-15)]=0


since the term-wise product of these five matrices expresses all the constraints on the years, as e.g.

ABCDE=A*CB*CC*D*CE


if A,D≤L and B,C,E>L, and I then looked by uniform draws [with a slight Gibbs flavour] for values of the integers that suited the constraints or their complement, the stopping rule being that the collection of A,…,E,L is producing an ABCDE binary matrix that agrees with all statements modulo the lying statuum of their authors:

yar=1:5
for (i in 1:5) yar[i]=sample(1:7,1)
L=sample(0:7,1)
ABCDE=((yar[1]>L)*CA+(yar[1]<=L)*A)*
((yar[2]>L)*CB+(yar[2]<=L)*B)*
((yar[3]>L)*CC+(yar[3]<=L)*C)*
((yar[4]>L)*CD+(yar[4]<=L)*D)*
((yar[5]>L)*CE+(yar[5]<=L)*E)
while (min(diag(ABCDE[,yar]))==0){
L=sample(0:7,1);idx=sample(1:5,1)
if (max(ABCDE[idx,])==1) yar[idx]=sample(which(ABCDE[idx,]>0),1)
ABCDE=((yar[1]>L)*CA+(yar[1]<=L)*A)*
((yar[2]>L)*CB+(yar[2]<=L)*B)*
((yar[3]>L)*CC+(yar[3]<=L)*C)*
((yar[4]>L)*CD+(yar[4]<=L)*D)*
((yar[5]>L)*CE+(yar[5]<=L)*E)
}


which always produces L=18,A=19,B=20,C=18,D=16 and E>19 as the unique solution (also reported by The Riddler).

> ABCDE
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    0    0    1    0    0    0
[2,]    0    0    0    0    1    0    0
[3,]    0    0    1    0    0    0    0
[4,]    1    0    0    0    0    0    0
[5,]    0    0    0    0    1    1    1


## maximal spacing around order statistics [#2]

Posted in Books, R, Statistics, University life with tags , , , , , , , , on June 8, 2018 by xi'an

The proposed solution of the riddle from the Riddler discussed here a few weeks ago is rather approximative, in that the distribution of

$\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|$

when the n-sample is made of iid Normal variates is (a) replaced with the distribution of one arbitrary minimum and (b) the distribution of the minimum is based on an assumption of independence between the absolute differences. Which does not hold, as shown by the above correlation matrix (plotted via corrplot) for N=11 and 10⁴ simulations. One could think that this correlation decreases with N, but it remains essentially 0.2 for larger values of N. (On the other hand, the minima are essentially independent.)

## maximal spacing around order statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 17, 2018 by xi'an

The riddle from the Riddler for the coming weeks is extremely simple to express in mathematical terms, as it summarises into characterising the distribution of

$\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|$

when the n-sample is made of iid Normal variates. I however had a hard time finding a result connected with this quantity since most available characterisations are for either Uniform or Exponential variates. I eventually found a 2017 arXival by Nagaraya et al.  covering the issue. Since the Normal distribution belongs to the Gumbel domain of attraction, the extreme spacings, that is the spacings between the most extreme orders statistics [rescaled by nφ(Φ⁻¹{1-n⁻¹})] are asymptotically independent and asymptotically distributed as (Theorem 5, p.15, after correcting a typo):

$(\xi_1,\xi_2/2,...)$

where the ξ’s are Exp(1) variates. A crude approximation is thus to consider that the above Δ is distributed as the maximum of two standard and independent exponential distributions, modulo the rescaling by  nφ(Φ⁻¹{1-n⁻¹})… But a more adequate result was pointed out to me by Gérard Biau, namely a 1986 Annals of Probability paper by Paul Deheuvels, my former head at ISUP, Université Pierre and Marie Curie. In this paper, Paul Deheuvels establishes that the largest spacing in a normal sample, M¹, satisfies

$\mathbb{P}(\sqrt{2\log\,n}\,M^1\le x) \to \prod_{i=1}^{\infty} (1-e^{-ix})^2$

from which a conservative upper bound on the value of n required for a given bound x⁰ can be derived. The simulation below compares the limiting cdf (in red) with the empirical cdf of the above Δ based on 10⁴ samples of size n=10³.The limiting cdf is the cdf of the maximum of an infinite sequence of independent exponentials with scales 1,½,…. Which connects with the above result, in fine. For a practical application, the 99% quantile of this distribution is 4.71. To achieve a maximum spacing of, say 0.1, with probability 0.99, one would need 2 log(n) > 5.29²/0.1², i.e., log(n)>1402, which is a pretty large number…