## gap frequencies [& e]

Posted in Kids, R with tags , , , on April 29, 2016 by xi'an

A riddle from The Riddler where brute-force simulation does not pay:

For a given integer N, pick at random without replacement integers between 1 and N by prohibiting consecutive integers until all possible entries are exhausted. What is the frequency of selected integers as N grows to infinity?

A simple implementation of the random experiment is as follows

generope=function(N){
frei=rep(1,N)
hus=0
while (max(frei)==1){
i=sample(rep((1:N)[frei==1],2),1)
frei[i]=frei[i+1]=frei[i-1]=0
hus=hus+1}
return(hus)}


It is however quite slow and does not exploit the recursive feature of the sampling, namely that the first draw breaks the set {1,…,N} into two sets:

generipe=function(N){
if (N<2){ return((N>0))}else{
i=sample(1:N,1)
return(1+generipe(i-2)+generipe(N-i-1))}}


But even this faster solution takes a while to run for large values of N:

frqns=function(N){
space=0
for (t in 1:1e3) space=space+generipe(N)
return(space/(N*1e3))}


as for instance

>  microbenchmark(frqns(100),time=10)
Unit: nanoseconds
expr       min        lq         mean    median        uq       max
frqns(100) 178720117 185020903 212212355.77 188710872 205865816 471395620
time         4         8        26.13        32        37       102


Hence this limits the realisation of simulation to, say, N=10⁴. Thinking further about the recursive aspect of the process however leads to a recursion on the frequencies qN, as it is straightforward to prove that

$q_N=\frac{1}{N}+\frac{2}{N^2}\,\sum_{i=1}^{N-2} iq_i$

with q1=1 and q2=1/2. This recursion can be further simplified into

$q_N=\frac{1}{N^2}+\frac{2(N-2)}{N^2}\,q_{N-2}+\frac{(N-1)^2}{N^2}q_{N-1}\qquad(1)$

which allows for a much faster computation

s=seq(1,1e7) #s[n]=n*q[n]
for (n in 3:1e7) s[n]=(1+2*s[n-2]+(n-1)*s[n-1])/n


and a limiting value of 0.4323324… Since storing s does not matter, a sliding update works even better:

a=b=1
for (n in 3:1e8){ c=(1+2*a+(n-1)*b)/n;a=b;b=c}


still producing a final value of 0.4323324, which may mean we have reached some limit in the precision.

As I could not figure out a way to find the limit of the sequence (1) above, I put it on the maths forum of Stack Exchange and very quickly got the answer (obtained by a power series representation) that the limit is (rather amazingly!)

$\dfrac{1 - e^{-2}}{2}$

which is 0.432332358.., hence very close to the numerical value obtained for n=3×10⁸. (Which does not change from n=10⁸, once again for precision reasons.) Now I wonder whether or not an easier derivation of the above is feasible, but we should learn about it in a few hours on The Riddler. [Update: The solution published by The Riddler is exactly that one, using a power series expansion to figure out the limit of the series, unfortunately. I was hoping for a de Montmort trick or sort of…]

## Le Monde puzzle [#960]

Posted in Kids, R with tags , , , , , on April 28, 2016 by xi'an

An arithmetic Le Monde mathematical puzzle:

Given an integer k>1, consider the sequence defined by F(1)=1+1 mod k, F²(1)=F(1)+2 mod k, F³(1)=F²(1)+3 mod k, &tc. [With this notation, F is not necessarily a function.] For which value of k is the sequence the entire {0,1,…,k-1} set?

This leads to an easy brute force resolution, for instance writing the R function

crkl<-function(k) return(unique(cumsum(1:(2*k))%%k))


where 2k is a sufficient substitute for ∞. Then the cases where the successive images of 1 visit the entire set {0,1,…,k-1} are given by

> for (i in 2:550) if (length(crkl(i))==i) print(i)
[1] 2
[1] 4
[1] 8
[1] 16
[1] 32
[1] 64
[1] 128
[1] 256
[1] 512


which suspiciously looks like the result that only the powers of 2 k=2,2²,2³,… lead to a complete exploration of the set {0,1,…,k-1}. Checking a few series in the plane back from Warwick, I quickly found that when k is odd, (1) the sequence is of period k and (2) there is symmetry in the sequence, which means it only takes (k-1)/2 values. For k even, there is a more complicated symmetry, with the sequence being of period 2k, symmetric around its two middle values, and taking the values 1,2,..,1+k(2k+1)/4,..,1+k(k+1)/2. Those values cannot cover the set {0,1,…,k-1} if two are equal, which means an i(i+1)/2 congruent to zero modulo k, hence equal to k. This is clearly impossible when k is a power of 2 because i and i+1 cannot both divide a power of 2. I waited for the published solution as of yesterday’s and the complete argument was to show that when N=2p, the corresponding sequence [for N] is made (modulo p) of the sequence for p plus the same sequence translated by p. The one for N is complete only if the one for p is complete, which by recursion eliminates all cases but the powers of 2…

## an integer programming riddle

Posted in Books, Kids, R with tags , , , , on April 21, 2016 by xi'an

A puzzle on The Riddler this week that ends up as a standard integer programming problem. Removing the little story around the question, it boils down to optimise

200a+100b+50c+25d

under the constraints

400a+400b+150c+50d≤1000, b≤a, a≤1, c≤8, d≤4,

and (a,b,c,d) all non-negative integers. My first attempt was a brute force R code since there are only 3 x 9 x 5 = 135 cases:

f.obj<-c(200,100,50,25)
f.con<-matrix(c(40,40,15,5,
-1,1,0,0,
1,0,0,0,
0,0,1,0,
0,0,0,1),ncol=4,byrow=TRUE)
f.dir<-c("=","=","=","=","=","=")
f.rhs<-c(100,0,1,8,4)

sol=0
for (a in 0:1)
for (b in 0:a)
for (k in 0:8)
for (d in 0:4){
cost=f.con%*%c(a,b,k,d)-f.rhs
if (max(cost)<=0){ gain=f.obj%*%c(a,b,k,d)
if (gain>sol){
sol=gain
argu=c(a,b,k,d)}}}


which returns the value:

> sol
[,1]
[1,]  425
> argu
[1] 1 0 3 3


This is confirmed by a call to an integer programming code like lpSolve:

> lp("max",f.obj,f.con,f.dir,f.rhs,all.int=TRUE)
Success: the objective function is 425
> lp("max",f.obj,f.con,f.dir,f.rhs,all.int=TRUE)\$sol
[1] 1 0 3 3


which provides the same solution.

## Le Monde puzzle [#959]

Posted in Kids, R with tags , , , on April 20, 2016 by xi'an

Another of those arithmetic Le Monde mathematical puzzle:

Find an integer A such that A is the sum of the squares of its four smallest dividers (including1) and an integer B such that B is the sum of the third poser of its four smallest factors. Are there such integers for higher powers?

This begs for a brute force resolution checking the integers until a solution appears. The only exciting part is providing the four smallest factors but a search on Stack overflow led to an existing R function:

FUN <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x))
return(div[x %% div == 0L])
}


(which uses the 0L representation I was unaware of) and hence my R code:

quest1<-function(n=2){
I=4
stop=TRUE
while ((stop)&(I<1e6)){
I=I+1
dive=FUN(I)
if (length(dive)>3)
stop=(I!=sum(sort(dive)[1:4]^n))
}
return(I)
}


But this code only seems to work for n=2 as produces A=130: it does not return any solution for the next value of n… As shown by the picture below, which solely exhibits a solution for n=2,5, A=17864 (in the second case), there is no solution less than 10⁶ for n=3,4,6,..9. So, unless I missed a point in the question, the solutions for n>2 are larger if they at all exist.

A resolution got published yesterday night in Le Monde  and (i) there is indeed no solution for n=3 (!), (ii) there are solutions for n=4 (1,419,874) and n=5 (1,015,690), which are larger than the 10⁶ bound I used in the R code, (iii) there is supposedly no solution for n=5!, when the R code found that 17,864=1⁵+2⁵+4⁵+7⁵… It is far from the first time the solution is wrong or incomplete!

## Le Monde puzzle [#958]

Posted in Books, Kids, R with tags , , , on April 11, 2016 by xi'an

A knapsack Le Monde mathematical puzzle:

Given n packages weighting each at most 5.8kg for a total weight of 300kg, is it always possible to allocate these packages  to 12 separate boxes weighting at most 30kg each? weighting at most 29kg each?

This can be checked by brute force using the following R code

#generate packages
paca=runif(1,0,5.8)
while (sum(paca)<300){
paca=c(paca,runif(1,0,5.8))}
paca=paca[-length(paca)]
paca=c(paca,300-sum(paca))


and

#check if they can be partitioned into
#12 groups of weight less than 30
box=vector(mode="list",length=12)
#random allocation
alloc=sample(1:12,length(paca),rep=TRUE)
content=rep(0,12)
for (i in 1:12){
box[[i]]=paca[alloc==i]
content[i]=sum(box[[i]])}
content=content*(content>0)
#wrong allocation
while (max(content)>30){
i=sample(1:12,1,prob=content)
j=sample(1:length(box[[i]]),1,prob=box[[i]])
#reallocation
k=sample(1:12,1,prob=max(content)-content)
while (k==i){
k=sample(1:12,1,prob=max(content)-content)}
content[i]=content[i]-box[[i]][j]
content[i]=content[i]*(content[i]>0)
content[k]=content[k]+box[[i]][j]
box[[k]]=c(box[[k]],box[[i]][j])
box[[i]]=box[[i]][-j]}


repeatedly and could not find an endless while loop. (Empty boxes sometimes lead to negative contents, hence my fix setting negative contents to zero.) But neither did I find an issue when the upper bound on the weight is 29kg… So it is either possible or rarely impossible! The R code immediately gets stuck when setting the bound at 25kg.

After reading the solution of April 13 in Le Monde, it appears that there is a counter example for the 29kg limit, namely 60 packages weighting 4.91kg plus one package weighting 5.4kg, since the first 60 packages fit inside 12 boxes and the last one is left out.

## Statistical rethinking [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on April 6, 2016 by xi'an

Statistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!

“Make no mistake: you will wreck Prague eventually.” (p.10)

While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book. Continue reading

## Le Monde puzzle [#956]

Posted in Kids, R with tags , , , on April 5, 2016 by xi'an

A Le Monde mathematical puzzle with little need of R programming but ending up being rather fascinating:

Does there exist a function f from N to N such that (i) f is (strictly) increasing, (ii) f(n)≥n, and (iii) f²(n)=f(f(n))=3n?

Indeed, the three constraints imply (a) f²(0)=0, hence that that f(0)=0, (b) f(1)=2 as it can be neither 1 (else f²(1) would be equal to 1, not to 3) nor 3 (else f(3)=f²(1)=3 would contradict both f(3)>f(1) and f²(3)=9), and thus (c) f(2)=f²(1)=3. Those three two initial values are actually sufficient to determine all subsequent values of f(n) as there are exactly three possible values between f²(n)=3n and f²(n+1)=3(n+1)=3n+3. (Kind of shaky argument, I must say, but each time an integer n has no antecedent in the previous f(m)’s, there are exactly two possible and successive values for two indeterminate f(n)’s… The resolution in Le Monde [published this afternoon] starts from the observation that f(3p)=2 3p and thus that f(2 3p)=3p+1, which leaves a single possible image for the values in between 3p and 2 3p.) It however provides no discussion whatsoever on the maths behind this freak function.To plot the above function, I used the R code below:

fillin&lt;-function(N=100){
vals=rep(0,N)
vals[1]=2
for (i in 2:N){
#anc for antecedent, f(anc)=i
u=0;anc=which(vals==i)
#no antecedent
if (length(anc)==0){
u=1; anc=which(vals==i-1)}
if (length(anc)==0){
u=2; anc=which(vals==i-2)}
vals[i]=3*anc*(u==0)+(u&gt;0)*(vals[i-u]+u)}
} 

The graph has many interesting features, one of which that explains why I did not plot the axes, namely that it is somewhat self-reproducing, in that the plot for N=10³ has the same pattern as the plot for N=10⁵, with a few long linear parts around the line y=√3x (since f is essentially an integer-valued version of √3x). Each linear part is about 3 times longer than the earlier one, with slopes of b=1 and b=3 alternating.

While the resolution is obvious for f²(n)=4n, since f(n)=√4n=2n, there is no single resolution when f²(n)=5n. (Maybe there is one if instead the third condition is that f⁴(n)=5n… A new puzzle for the reader.)