Archive for the R Category

Le Monde puzzle [#1049]

Posted in Books, Kids, R with tags , , , on April 18, 2018 by xi'an

An algorithmic Le Monde mathematical puzzle with a direct

Alice and Bob play a game by picking alternatively one of the remaining digits between 1 and 10 and putting it in either one of two available stacks, 1 or 2. Their respective gains are the products of the piles (1 for Alice and 2 for Bob).

The problem is manageable by a recursive function

facten=factorial(10)
pick=function(play=1,remz=matrix(0,2,5)){
 if ((min(remz[1,])>0)||(min(remz[2,])>0)){#finale
  remz[remz==0]=(1:10)[!(1:10)%in%remz]
  return(prod(remz[play,]))
  }else{
   gainz=0
   for (i in (1:10)[!(1:10)%in%remz]){
     propz=rbind(c(remz[1,remz[1,]>0],i,
     rep(0,sum(remz[1,]==0)-1)),remz[2,])
     gainz=max(gainz,facten/pick(3-play,remz=propz))}
   for (i in (1:10)[!(1:10)%in%remz]){
     propz=rbind(remz[1,],c(remz[2,remz[2,]>0],i,
     rep(0,sum(remz[2,]==0)-1)))
     gainz=max(gainz,facten/pick(3-play,remz=propz))}
return(gainz)}}

that shows the optimal gain for Alice is 3360=2x5x6x7x 8, versus Bob getting 1080=1x3x4x9x10. The moves ensuring the gain are 2-10-…

a [Gregorian] calendar riddle

Posted in R with tags , , , , , on April 17, 2018 by xi'an

A simple riddle express this week on The Riddler, about finding the years between 2001 and 2099 with the most cases when day x month = year [all entries with two digits]. For instance, this works for 1 January, 2001 since 01=01 x 01. The only difficulty in writing an R code for this question is to figure out the number of days in a given month of a given year (in order to include leap years).

The solution was however quickly found on Stack Overflow and the resulting code is

#safer beta quantile
numOD <- function(date) {
    m <- format(date, format="%m")
    while (format(date, format="%m") == m) date <- date + 1
    return(as.integer(format(date - 1, format="%d")))
}
dayz=matrix(31,12,99)
for (i in 2001:2099)
for (j in 2:11)
  dayz[j,i-2000]=numOD(as.Date(
  paste(i,"-",j,"-1",sep=""),"%Y-%m-%d"))

monz=rep(0,99)
for (i in 1:99){
for (j in 1:12)
  if ((i==(i%/%j)*j)&((i%/%j)<=dayz[j,i])) 
    monz[i]=monz[i]+1}

The best year in this respect being 2024, with 7 occurrences of the year being the product of a month and a day…

Masterclass in Bayesian Statistics in Marseilles next Fall

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on April 9, 2018 by xi'an

This post is to announce a second occurrence of the exciting “masterclass in Bayesian Statistics” that we organised in 2016, near Marseilles. It will take place on 22-26 October 2018 once more at CIRM (Centre International de Recherches Mathématiques, Luminy, Marseilles, France). The targeted audience includes all scientists interested in learning how Bayesian inference may be used to tackle the practical problems they face in their own research. In particular PhD students and post-docs should benefit most directly from this masterclass. Among the invited speakers, Kerrie Mengersen from QUT, Brisbane, visiting Marseilles this Fall, will deliver a series of lectures on the interface between Bayesian statistics and applied modelling, Havard Rue from KAUST will talk on computing with INLA, and Aki Vehtari from Aalto U, Helsinki, will give a course on Bayesian model assessment and model choice. There will be two tutorials on R and on Stan.

All interested participants in this masterclass should pre-register as early as possible, given that the total attendance is limited to roughly 90 participants. Some specific funding for local expenses (i.e., food + accommodation on-siteat CIRM) is available (thanks to CIRM, and potentially to Fondation Jacques Hadamard, to be confirmed); this funding will be attributed by the scientific committee, with high priority to PhD students and post-docs.

Le Monde puzzle [#1048]

Posted in Books, Kids, R with tags , , , , , on April 1, 2018 by xi'an

An arithmetic Le Monde mathematical puzzle:

A magical integer m is such that the remainder of the division of any prime number p by m is either a prime number or 1. What is the unique magical integer between 25 and 100? And is there any less than 25?

The question is dead easy to code

primz=c(1,generate_primes(2,1e6))
for (y in 25:10000)
  if (min((primz[primz>y]%%y)%in%primz)==1) print(y)

and return m=30 as the only solution. Bon sang but of course!, since 30=2x3x5… (Actually, the result follows by dividing the quotient of the division of a prime number by 2 by 3 and then the resulting quotient by 5: all possible cases produce a remainder that is a prime number.) For the second question, the same code returns 2,3,4,6,8,12,18,24 as further solutions. There is no solution beyond 30.

a null hypothesis with a 99% probability to be true…

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on March 28, 2018 by xi'an

When checking the Python t distribution random generator, np.random.standard_t(), I came upon this manual page, which actually does not explain how the random generator works but spends instead the whole page to recall Gosset’s t test, illustrating its use on an energy intake of 11 women, but ending up misleading the readers by interpreting a .009 one-sided p-value as meaning “the null hypothesis [on the hypothesised mean] has a probability of about 99% of being true”! Actually, Python’s standard deviation estimator x.std() further returns by default a non-standard standard deviation, dividing by n rather than n-1…

spacings on a torus

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on March 22, 2018 by xi'an

While in Brussels last week I noticed an interesting question on X validated that I considered in the train back home and then more over the weekend. This is a question about spacings, namely how long on average does it take to cover an interval of length L when drawing unit intervals at random (with a torus handling of the endpoints)? Which immediately reminded me of Wilfrid Kendall (Warwick) famous gif animation of coupling from the past via leaves covering a square region, from the top (forward) and from the bottom (backward)…

The problem is rather easily expressed in terms of uniform spacings, more specifically on the maximum spacing being less than 1 (or 1/L depending on the parameterisation). Except for the additional constraint at the boundary, which is not independent of the other spacings. Replacing this extra event with an independent spacing, there exists a direct formula for the expected stopping time, which can be checked rather easily by simulation. But the exact case appears to be add a few more steps to the draws, 3/2 apparently. The following graph displays the regression of the Monte Carlo number of steps over 10⁴ replicas against the exact values:

uniform on the sphere [or not]

Posted in pictures, R, Statistics with tags , , , , , , , , , , , , on March 8, 2018 by xi'an

While looking at X validated questions, I came upon this comment that simulating a uniform distribution on a d-dimensional unit sphere does not proceed from generating angles at random on (0,2π) and computing spherical coordinates… Which I must confess would have been my initial suggestion! This is obvious, nonetheless, when computing the Jacobian of the spherical coordinate transform, which involves powers of the sines of the angles, in a decreasing sequence from d-1 to zero. This means that the angles should be simulated according to their respective sine-power densities. However, except for the d=3 case, where simulating from the density sin(φ) is straightforward by inverse cdf, i.e. φ=acos(1-2u), the cdfs for the higher powers are combinations of sines and cosines, and as such are not easily inverted. Take for instance the eighth power:

F⁸(φ)=(840 φ – 672 sin(2 φ) + 168 sin(4 φ) – 32 sin(6 φ) + 3 sin(8 φ))/3072

While the densities are bounded by sin(φ), up to a constant, and hence an accept-reject can be easily derived, the efficiency decreases with the dimension according to the respective ratio of the Wallis’ integrals, unsurprisingly. A quick check for d=4 shows that the Normal simulation+projection-by-division-by-its-norm is faster.

Puzzling a bit further about this while running, I wondered at the simultaneous simulations from sin(φ), sin(φ)², sin(φ)³, &tc., but cannot see a faster way to recycle simulations from sin(φ). Points (φ,u) located in-between two adjacent power curves are acceptable simulations from the corresponding upper curve but they need be augmented by points (φ,u) under the lower curve to constitute a representative sample. In the end, this amounts to multiplying simulations from the highest power density as many times as there are powers. No gain in sight… Sigh!

However, a few days later, while enjoying the sunset over Mont Blanc(!), I figured out that there exists a direct and efficient way to simulate from these powers of the sine function. Indeed, when looking at the density of cos(φ), it happens to be the signed root of a Beta(½,(d-1)/2), which avoids the accept-reject step. Presumably this is well-known, but I have not seen this proposal associated with the uniform distribution on the sphere.