Archive for R

Le Monde puzzle [#1000…1025]

Posted in Kids, R with tags , , , , , , on March 28, 2017 by xi'an

Le Monde mathematical puzzle launched a competition to celebrate its 1000th puzzle! A fairly long-term competition as it runs over the 25 coming puzzles (and hence weeks). Starting with puzzle #1001. Here is the 1000th puzzle, not part of the competition:

Alice & Bob spend five (identical) vouchers in five different shops, each time buying the maximum number of items to get close to the voucher value. In these five shops, they buy sofas at 421 euros each, beds at 347 euros each, kitchen appliances at 289 euros each, tables at 251 euros each and bikes at 211 euros each, respectively. Once the buying frenzy is over, they realise that within a single shop, they would have spent exactly four vouchers for the same products. What is the value of a voucher?

Le Monde puzzle [#1001]

Posted in Kids, R with tags , , , , on March 27, 2017 by xi'an

After a long lag (due to my missing the free copies distributed at Paris-Dauphine!), here is a Sudoku-like Le Monde mathematical puzzle:

A grid of size (n,n) holds integer values such that any entry larger than 1 is the sum of one term in the same column and one term in the same row. What is the maximal possible value observed in such a grid when n=3,4?

This can be solved in R by a random exploration of such possible grids in a simulated annealing spirit:

mat=matrix(1,N,N)
goal=1

targ=function(mat){ #check constraints
  d=0
  for (i in (1:(N*N))[mat>1]){
    r=(i-1)%%N+1;c=(i-1)%/%N+1
    d=d+(min(abs(mat[i]-outer(mat[-r,c],mat[r,-c],"+")))>0)} 
  return(d)}

cur=0
for (t in 1:1e6){
  i=sample(1:(N*N),1);prop=mat
  prop[i]=sample(1:(2*goal),1)
  d=targ(prop)
  if (10*log(runif(1))/t<cur-d){ 
      mat=prop;cur=d} 
  if ((d==0)&(max(prop)>goal)){
     goal=max(prop);maxx=prop}}

returning a value of 8 for n=3 and 37 for n=4. However, the method is quite myopic and I tried instead a random filling of the grid, using each time the maximum possible sum for empty cells:

goal=1
for (v in 1:1e6){
  mat=matrix(0,N,N)
  #one 1 per row/col
  for (i in 1:N) mat[i,sample(1:N,1)]=1
  for (i in 1:N) if (max(mat[,i])==0) mat[sample(1:N,1),i]=1
  while (min(mat)==0){
   parm=sample(1:(N*N)) #random order
   for (i in parm[mat[parm]==0]){
    r=(i-1)%%N+1;c=(i-1)%/%N+1
    if ((max(mat[-r,c])>0)&(max(mat[r,-c])>0)){
      mat[i]=max(mat[-r,c])+max(mat[r,-c])
      break()}}}
    if (goal<max(mat)){
    goal=max(mat);maxx=mat}}

which recovered a maximum of 8 for n=3, but reached 48 for n=4. And 211 for n=5, 647 for n=6… For instance, here is the solution for n=4:

[1,]    1    5   11   10
[2,]    2    4    1    5
[3,]   48    2   24    1
[4,]   24    1   22   11

Continue reading

what does more efficient Monte Carlo mean?

Posted in Books, Kids, R, Statistics with tags , , , , , , on March 17, 2017 by xi'an

“I was just thinking that there might be a magic trick to simulate directly from this distribution without having to go for less efficient methods.”

In a simple question on X validated a few days ago [about simulating from x²φ(x)] popped up the remark that the person asking the question wanted a direct simulation method for higher efficiency. Compared with an accept-reject solution. Which shows a misunderstanding of what “efficiency” means on Monte Carlo situations. If it means anything, I would think it is reflected in the average time taken to return one simulation and possibly in the worst case. But there is no reason to call an inverse cdf method more efficient than an accept reject or a transform approach since it all depends on the time it takes to make the inversion compared with the other solutions… Since inverting the closed-form cdf in this example is much more expensive than generating a Gamma(½,½), and taking plus or minus its root, this is certainly the case here. Maybe a ziggurat method could be devised, especially since x²φ(x)<φ(x) when |x|≤1, but I am not sure it is worth the effort!

Peter Lee (1940?-2017)

Posted in Books, pictures, R, Statistics, University life, Wines with tags , , , , , , on March 12, 2017 by xi'an

Just heard the sad news that Peter Lee, British Bayesian and author of Bayesian Statistics: An Introduction, has passed away yesterday night. While I did not know him, I remember meeting him at a few conferences in the UK and spending an hilarious evening at the pub. When the book came out, I thought it was quite fine an introduction to Bayesian Statistics, with enough mathematical details and prerequisites to make it worthwhile studying, while also including computational recommendations. Fare thee well, Peter.

Nature snapshot

Posted in Statistics with tags , , , , , , , on March 5, 2017 by xi'an

The recent issue of Nature, as of Jan 26, 2017!, contained a cartload of interesting review and coverage articles, from the latest version of the quantum computer D-Wave, with a paragraph on quantum annealing that reminded me of a recent arXiv paper I could not understand, seemingly turning the mathematical problem of multivariate optimisation into a truly physical process, to the continuing (Nature-wise) debate on how to oppose Trump, to the biases and shortcomings of policing software, with a mention of Lum and Isaac I discussed here a few months ago, to the unsuspected difficulty to publish a referee’s report when the publisher is Elsevier (unsuspected and unsurprising!)—although I know of colleagues and authors disapproving my publishing referee’s reports identified as such—, to an amazing picture of a bundle of neurons monitored simultaneously, to an entry in the career section on scientific computing and the importance of coding for young investigators, with R at the forefront!

testing R code [book review]

Posted in R, Statistics, Travel with tags , , , , , , on March 1, 2017 by xi'an

When I saw this title among the CRC Press novelties, I immediately ordered it as I though it fairly exciting. Now that I have gone through the book, the excitement has died. Maybe faster than need be as I read it while being stuck in a soulless Schipol airport and missing the only ice-climbing opportunity of the year!

Testing R Code was written by Richard Cotton and is quite short: once you take out the appendices and the answers to the exercises, it is about 130 pages long, with a significant proportion of code and output. And it is about some functions developed by Hadley Wickham from RStudio, for testing the coherence of R code in terms of inputs more than outputs. The functions are assertive and testthat. Intended for run-time versus development-time testing. Meaning that the output versus the input are what the author of the code intends them to be. The other chapters contain advices and heuristics about writing maintainable testable code, and incorporating a testing feature in an R package.

While I am definitely a poorly qualified reader for this type of R books, my disappointment stems from my expectation of a book about debugging R code, which is possibly due to a misunderstanding of the term testing. This is an unrealistic expectation, for sure, as testing for a code to produce what it is supposed to do requires some advanced knowledge of what the output should be, at least in some representative situations. Which means using interface like RStudio is capital in spotting unsavoury behaviours of some variables, if not foolproof in any case.

a riddle at the end of its tether

Posted in R, Statistics with tags , , on February 24, 2017 by xi'an

A simply worded riddle this week on The Riddler, about four ropes having non-uniform and unknown burning rates, the only constraint being they all burn completely in one hour. With the help of a lighter (or even a single match), what are the possible units of time one can measure by burning them?

While I had worked out a range of possible times for each of the ropes to complete its combustion, I went for a simulation based confirmation. The starting point is that a new fire can only be started when a burning rope ends up burning. Which only happens for a finite number of possibilities. I found 24 of them, consisting of

> total*prec
[1]  0.000 0.5000 0.750 0.875 0.9375 1.000 1.125 1.1875 1.25 1.3125
[11] 1.375 1.4375 1.500 1.625 1.7500 1.875 2.000 2.1250 2.25 2.5000
[21] 2.750 3.0000 3.500 4.000

i.e., some combinations of 1, 2⁻¹, …, 2⁻⁴, with the comment that those times cannot all be obtained within a single arson experiment.

Continue reading