Archive for R

Sunday morning puzzle

Posted in Books, Kids, R with tags , , , on November 22, 2015 by xi'an

A question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious:

If a sample taken from an arbitrary distribution on {0,1}⁶ is censored from its (0,0,0,0,0,0) elements, and if the marginal probabilities are know for all six components of the random vector, what is an estimate of the proportion of (missing) (0,0,0,0,0,0) elements? 

Since the censoring modifies all probabilities by the same renormalisation, i.e. divides them by the probability to be different from (0,0,0,0,0,0), ρ, this probability can be estimated by looking at the marginal probabilities to be equal to 1, which equal the original and known marginal probabilities divided by ρ. Here is a short R code illustrating the approach that I wrote in the taxi home yesterday night:

#generate vectors
zprobs=c(.1,.9) #iid example
#estimated original size

A broader question is how many values (and which values) of the sample can be removed before this recovery gets impossible (with the same amount of information).

Le Monde puzzle [#937]

Posted in Books, R with tags , , , , on November 11, 2015 by xi'an

A combinatoric Le Monde mathematical puzzle that resembles many earlier ones:

Given a pool of 30 interns allocated to three person night-shifts, is it possible to see 31 consecutive nights such that (a) all the shifts differ and (b) there are no pair of shifts with a single common intern?

In fact, the constraint there is very strong: two pairs of shift can only share zero or two interns. For one given shift, there are 26 other shifts with which it can share two interns, but then any two of those 26 others must share zero or two, which makes the two common to all and exclude any additional shift. But this is not the only approach to allocate the interns over the shifts since, as pointed out by Jean-Louis and checking with the following R code, 28 and not 27 is the maximum possible number of shifts under those conditions.


while (((length(acpt)==1)+(length(acpt==3)))>0){
for(i in 3:31){
  for (j in 2:(i-1))
  while ((acpt>0)&(k<4061)){
    for (j in 2:(i-1))
  if (k==4061) break()

Continue reading

Le Monde puzzle [#934]

Posted in Kids, Statistics, University life, Books, Travel, pictures with tags , , , , on October 25, 2015 by xi'an

Another Le Monde mathematical puzzle with no R code:

Given a collection of 2€ coins and 5€ bills that sum up to 120€, find the number of 5€ bills such that the collection cannot be divided into 5 even parts.

Indeed, as soon as one starts formalising the problem, it falls apart: if there are a 5€ bills and b 2€ coins, we have 5a+2b=120, hence 2b=120-5a=5(24-a), meaning that b must be a multiple of 5, b=5b’ and a must be even, a=2a’, with b’=12-a’.  Hence, 10 possible values for both pairs (a’,b’) and (a,b), since a>0 and b>0. If these 120 Euros can be evenly divided between 5 persons, each gets 24€. Now, 24€ can be decomposed in 5€ bills and 2€ coins in three ways:


Each of the five persons using any of the 3 above decompositions means there exist integers α, β, and γ such that


with α+β+γ=5; therefore a=4α+2γ and b=2α+12β+7γ, which implies 2α+γ=a’ and 2α+12β+87γ=5×12-5a’=2α+5×12-12α-12γ+7γ, or 5a’=10α+5γ. That is, 2α+γ=a’ again… If a’=11, there is no solution with α+γ≤5, and this is the only such case. For any other value of a’, there is a way to divide the 120€ in 5 even parts. As often, I wonder at the point of the puzzle if this is the answer or at its phrasing if I have misunderstood the question.

Just to check the above by R means, I still wrote a short R code

for (a in 1:11){
# find integer solutions to 2x+y=a
  while ((z<a)&(z<6)&(sum<2)){

which returned

[1]  2 50  0  4  1
[1]  4 45  1  4  0
[1]  6 40  1  3  1
[1]  8 35  2  3  0
[1] 10 30  2  2  1
[1] 12 25  3  2  0
[1] 14 20  3  1  1
[1] 16 15  4  1  0
[1] 18 10  4  0  1
[1] 20  5  5  0  0
[1] 22  0  5 -1  1

meaning that a’=11 does not produce a viable solution.

Le Monde puzzle [#932]

Posted in Books, Kids, Statistics, University life with tags , , , , on October 15, 2015 by xi'an

A Sudoku-like Le Monde mathematical puzzle:

A 12×8 grid contains the first 96 integers, as in R matrix(1:96,ncol=12). If one picks 24 of those integers including 3 for each row and 2 for each column, what are the extreme values of the sum of the selected integers?

I obviously rephrased quite strongly the question (and possibly changed it!). Before rushing to the R simulation of a random collection of 24 such integers, I pondered how this sum could vary among random samples since there were the same terms in all samples. More clearly, using the 10×10 grid instead as a basis for reasoning, picking e.g. 20 integers with 2 per row and 2 per colum for all rows and columns, we end up with 2 copies of every integer between 0 and 9 and 2 copies of every decimal between 0 and 90. Random simulation confirms this reasoning:

#pick a subset at random
for (i in 2:8){

since repeated call to the above keeps returning the same value, 1164, which is also

> sum(3*(0:7))*12+2*sum(1:12)
[1] 1164

Le Monde puzzle [#930]

Posted in Kids, Statistics, University life, Books with tags , , , on October 9, 2015 by xi'an

A game Le Monde mathematical puzzle:

On a linear board of length 17, Alice and Bob set alternatively red and blue tokens. Two tokens of the same colour cannot sit next to one another. Devise a winning strategy for the first player.

In the ‘Og tradition, this calls for a recurrent R code:

   # stopping rule
   if (sum(tak==col)==0){
     for (i in (1:n)[tak!=-col])
   # left positions
   if (sum(frei)>0){
     for (i in (1:n)[frei==1]){
   # outcome of best choice

While I did not run the rudimentary recursive function for n=17, I got a zero return from n=2 till n=12, meaning that the starting player is always going to lose if the other player plays optimally.

ABC model choice via random forests [and no fire]

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on September 4, 2015 by xi'an

While my arXiv newspage today had a puzzling entry about modelling UFOs sightings in France, it also broadcast our revision of Reliable ABC model choice via random forests, version that we resubmitted today to Bioinformatics after a quite thorough upgrade, the most dramatic one being the realisation we could also approximate the posterior probability of the selected model via another random forest. (With no connection with the recent post on forest fires!) As discussed a little while ago on the ‘Og. And also in conjunction with our creating the abcrf R package for running ABC model choice out of a reference table. While it has been an excruciatingly slow process (the initial version of the arXived document dates from June 2014, the PNAS submission was rejected for not being enough Bayesian, and the latest revision took the whole summer), the slow maturation of our thoughts on the model choice issues led us to modify the role of random forests in the ABC approach to model choice, in that we reverted our earlier assessment that they could only be trusted for selecting the most likely model, by realising this summer the corresponding posterior could be expressed as a posterior loss and estimated by a secondary forest. As first considered in Stoehr et al. (2014). (In retrospect, this brings an answer to one of the earlier referee’s comments.) Next goal is to incorporate those changes in DIYABC (and wait for the next version of the software to appear). Another best-selling innovation due to Arnaud: we added a practical implementation section in the format of FAQ for issues related with the calibration of the algorithms.

likelihood-free inference in high-dimensional models

Posted in Books, R, Statistics, University life with tags , , , , , , , , , on September 1, 2015 by xi'an

“…for a general linear model (GLM), a single linear function is a sufficient statistic for each associated parameter…”

Water Tower, Michigan Avenue, Chicago, Oct. 31, 2012The recently arXived paper “Likelihood-free inference in high-dimensional models“, by Kousathanas et al. (July 2015), proposes an ABC resolution of the dimensionality curse [when the dimension of the parameter and of the corresponding summary statistics] by turning Gibbs-like and by using a component-by-component ABC-MCMC update that allows for low dimensional statistics. In the (rare) event there exists a conditional sufficient statistic for each component of the parameter vector, the approach is just as justified as when using a generic ABC-Gibbs method based on the whole data. Otherwise, that is, when using a non-sufficient estimator of the corresponding component (as, e.g., in a generalised [not general!] linear model), the approach is less coherent as there is no joint target associated with the Gibbs moves. One may therefore wonder at the convergence properties of the resulting algorithm. The only safe case [in dimension 2] is when one of the restricted conditionals does not depend on the other parameter. Note also that each Gibbs step a priori requires the simulation of a new pseudo-dataset, which may be a major imposition on computing time. And that setting the tolerance for each parameter is a delicate calibration issue because in principle the tolerance should depend on the other component values. Continue reading


Get every new post delivered to your Inbox.

Join 944 other followers