Archive for R

gap frequencies [& e]

Posted in Kids, R with tags , , , on April 29, 2016 by xi'an

A riddle from The Riddler where brute-force simulation does not pay:

For a given integer N, pick at random without replacement integers between 1 and N by prohibiting consecutive integers until all possible entries are exhausted. What is the frequency of selected integers as N grows to infinity?

A simple implementation of the random experiment is as follows

  while (max(frei)==1){

It is however quite slow and does not exploit the recursive feature of the sampling, namely that the first draw breaks the set {1,…,N} into two sets:

  if (N<2){ return((N>0))}else{

But even this faster solution takes a while to run for large values of N:

  for (t in 1:1e3) space=space+generipe(N)

as for instance

>  microbenchmark(frqns(100),time=10)
Unit: nanoseconds
       expr       min        lq         mean    median        uq       max
 frqns(100) 178720117 185020903 212212355.77 188710872 205865816 471395620
       time         4         8        26.13        32        37       102

Hence this limits the realisation of simulation to, say, N=10⁴. Thinking further about the recursive aspect of the process however leads to a recursion on the frequencies qN, as it is straightforward to prove that

q_N=\frac{1}{N}+\frac{2}{N^2}\,\sum_{i=1}^{N-2} iq_i

with q1=1 and q2=1/2. This recursion can be further simplified into


which allows for a much faster computation

s=seq(1,1e7) #s[n]=n*q[n]
for (n in 3:1e7) s[n]=(1+2*s[n-2]+(n-1)*s[n-1])/n

and a limiting value of 0.4323324… Since storing s does not matter, a sliding update works even better:

for (n in 3:1e8){ c=(1+2*a+(n-1)*b)/n;a=b;b=c}

still producing a final value of 0.4323324, which may mean we have reached some limit in the precision.

As I could not figure out a way to find the limit of the sequence (1) above, I put it on the maths forum of Stack Exchange and very quickly got the answer (obtained by a power series representation) that the limit is (rather amazingly!)

\dfrac{1 - e^{-2}}{2}

which is 0.432332358.., hence very close to the numerical value obtained for n=3×10⁸. (Which does not change from n=10⁸, once again for precision reasons.) Now I wonder whether or not an easier derivation of the above is feasible, but we should learn about it in a few hours on The Riddler. [Update: The solution published by The Riddler is exactly that one, using a power series expansion to figure out the limit of the series, unfortunately. I was hoping for a de Montmort trick or sort of…]

Le Monde puzzle [#960]

Posted in Kids, R with tags , , , , , on April 28, 2016 by xi'an

An arithmetic Le Monde mathematical puzzle:

Given an integer k>1, consider the sequence defined by F(1)=1+1 mod k, F²(1)=F(1)+2 mod k, F³(1)=F²(1)+3 mod k, &tc. [With this notation, F is not necessarily a function.] For which value of k is the sequence the entire {0,1,…,k-1} set?

This leads to an easy brute force resolution, for instance writing the R function

crkl<-function(k) return(unique(cumsum(1:(2*k))%%k))

where 2k is a sufficient substitute for ∞. Then the cases where the successive images of 1 visit the entire set {0,1,…,k-1} are given by

> for (i in 2:550) if (length(crkl(i))==i) print(i)
[1] 2
[1] 4
[1] 8
[1] 16
[1] 32
[1] 64
[1] 128
[1] 256
[1] 512

which suspiciously looks like the result that only the powers of 2 k=2,2²,2³,… lead to a complete exploration of the set {0,1,…,k-1}. Checking a few series in the plane back from Warwick, I quickly found that when k is odd, (1) the sequence is of period k and (2) there is symmetry in the sequence, which means it only takes (k-1)/2 values. For k even, there is a more complicated symmetry, with the sequence being of period 2k, symmetric around its two middle values, and taking the values 1,2,..,1+k(2k+1)/4,..,1+k(k+1)/2. Those values cannot cover the set {0,1,…,k-1} if two are equal, which means an i(i+1)/2 congruent to zero modulo k, hence equal to k. This is clearly impossible when k is a power of 2 because i and i+1 cannot both divide a power of 2. I waited for the published solution as of yesterday’s and the complete argument was to show that when N=2p, the corresponding sequence [for N] is made (modulo p) of the sequence for p plus the same sequence translated by p. The one for N is complete only if the one for p is complete, which by recursion eliminates all cases but the powers of 2…

Le Monde puzzle [#959]

Posted in Kids, R with tags , , , on April 20, 2016 by xi'an

Another of those arithmetic Le Monde mathematical puzzle:

Find an integer A such that A is the sum of the squares of its four smallest dividers (including1) and an integer B such that B is the sum of the third poser of its four smallest factors. Are there such integers for higher powers?

This begs for a brute force resolution checking the integers until a solution appears. The only exciting part is providing the four smallest factors but a search on Stack overflow led to an existing R function:

FUN <- function(x) {
    x <- as.integer(x)
    div <- seq_len(abs(x))
    return(div[x %% div == 0L])

(which uses the 0L representation I was unaware of) and hence my R code:

 while ((stop)&(I<1e6)){
  if (length(dive)>3)

But this code only seems to work for n=2 as produces A=130: it does not return any solution for the next value of n… As shown by the picture below, which solely exhibits a solution for n=2,5, A=17864 (in the second case), there is no solution less than 10⁶ for n=3,4,6,..9. So, unless I missed a point in the question, the solutions for n>2 are larger if they at all exist.

lemonde959A resolution got published yesterday night in Le Monde  and (i) there is indeed no solution for n=3 (!), (ii) there are solutions for n=4 (1,419,874) and n=5 (1,015,690), which are larger than the 10⁶ bound I used in the R code, (iii) there is supposedly no solution for n=5!, when the R code found that 17,864=1⁵+2⁵+4⁵+7⁵… It is far from the first time the solution is wrong or incomplete!

Le Monde puzzle [#958]

Posted in Books, Kids, R with tags , , , on April 11, 2016 by xi'an

A knapsack Le Monde mathematical puzzle:

Given n packages weighting each at most 5.8kg for a total weight of 300kg, is it always possible to allocate these packages  to 12 separate boxes weighting at most 30kg each? weighting at most 29kg each?

This can be checked by brute force using the following R code

#generate packages
while (sum(paca)<300){ 


#check if they can be partitioned into 
#12 groups of weight less than 30 
#random allocation 
for (i in 1:12){ 
#wrong allocation 
 while (max(content)>30){
  while (k==i){

repeatedly and could not find an endless while loop. (Empty boxes sometimes lead to negative contents, hence my fix setting negative contents to zero.) But neither did I find an issue when the upper bound on the weight is 29kg… So it is either possible or rarely impossible! The R code immediately gets stuck when setting the bound at 25kg.

After reading the solution of April 13 in Le Monde, it appears that there is a counter example for the 29kg limit, namely 60 packages weighting 4.91kg plus one package weighting 5.4kg, since the first 60 packages fit inside 12 boxes and the last one is left out.

Statistical rethinking [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on April 6, 2016 by xi'an

Statistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!

“Make no mistake: you will wreck Prague eventually.” (p.10)

While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book. Continue reading

Le Monde puzzle [#956]

Posted in Kids, R with tags , , , on April 5, 2016 by xi'an

A Le Monde mathematical puzzle with little need of R programming but ending up being rather fascinating:

Does there exist a function f from N to N such that (i) f is (strictly) increasing, (ii) f(n)≥n, and (iii) f²(n)=f(f(n))=3n?

Indeed, the three constraints imply (a) f²(0)=0, hence that that f(0)=0, (b) f(1)=2 as it can be neither 1 (else f²(1) would be equal to 1, not to 3) nor 3 (else f(3)=f²(1)=3 would contradict both f(3)>f(1) and f²(3)=9), and thus (c) f(2)=f²(1)=3. Those three two initial values are actually sufficient to determine all subsequent values of f(n) as there are exactly three possible values between f²(n)=3n and f²(n+1)=3(n+1)=3n+3. (Kind of shaky argument, I must say, but each time an integer n has no antecedent in the previous f(m)’s, there are exactly two possible and successive values for two indeterminate f(n)’s… The resolution in Le Monde [published this afternoon] starts from the observation that f(3p)=2 3p and thus that f(2 3p)=3p+1, which leaves a single possible image for the values in between 3p and 2 3p.) It however provides no discussion whatsoever on the maths behind this freak function.lemonde956To plot the above function, I used the R code below:

 for (i in 2:N){
 #anc for antecedent, f(anc)=i
 #no antecedent
  if (length(anc)==0){
   u=1; anc=which(vals==i-1)}
  if (length(anc)==0){
   u=2; anc=which(vals==i-2)}

The graph has many interesting features, one of which that explains why I did not plot the axes, namely that it is somewhat self-reproducing, in that the plot for N=10³ has the same pattern as the plot for N=10⁵, with a few long linear parts around the line y=√3x (since f is essentially an integer-valued version of √3x). Each linear part is about 3 times longer than the earlier one, with slopes of b=1 and b=3 alternating.

While the resolution is obvious for f²(n)=4n, since f(n)=√4n=2n, there is no single resolution when f²(n)=5n. (Maybe there is one if instead the third condition is that f⁴(n)=5n… A new puzzle for the reader.)

Le Monde puzzle [#954]

Posted in Kids, R with tags , , , , , , on March 25, 2016 by xi'an

A square Le Monde mathematical puzzle:

Given a triplet (a,b,c) of integers, with a<b<c, it satisfies the S property when a+b, a+c, b+c, a+b+c are perfect squares such that a+c, b+c, and a+b+c are consecutive squares. For a given a, is it always possible to find a pair (b,c) such (a,b,c) satisfies S? Can you find the triplet (a,b,c) that produces the sum a+b+c closest to 1000?

This is a rather interesting challenge and a brute force resolution does not produce interesting results. For instance, using the function is.whole from the package Rmpfr, the R functions

ess <- function(a,b,k){
#assumes a<b<k


 while (b<1000*a){
  if (is.whole(sqrt(a+b))){
   while (k<100*b){
    if (is.whole(sqrt(a+k))&is.whole(b+k))
     if (ess(a,b,k)) break()

do not return any solution when a=1,2,3,4,5

Looking at the property that a+b,a+c,b+c, and a+b+c are perfect squares α²,β²,γ², and δ². This implies that

a=(δ+γ)(δ-γ), b=(δ+β)(δ-β), and c=(δ+α)(δ-α)

with 1<α<β<γ<δ. If we assume β²,γ², and δ² consecutive squares, this means β=γ-1 and δ=γ+1, hence

a=2γ+1, b=4γ, and c=(γ+1+α)(γ+1-α)

which leads to only two terms to examine. Hence writing another R function


and running a check for the smallest values of α and γ leads to the few solutions available:

> for (ga in 3:1e4)
for(al in 1:(ga-2))
if (ess(abc(al,ga))) print(abc(al,ga))
[1] 41 80 41 320
[1] 57 112 672
[1] 97 192 2112
[1] 121 240 3360
[1] 177 352 7392
[1] 209 416 10400
[1] 281 560 19040
[1] 321 640 24960
[1] 409 816 40800
[1] 457 912 51072


Get every new post delivered to your Inbox.

Join 1,021 other followers