Archive for the R Category

sequence riddle

Posted in Kids, R with tags , , , , , on August 10, 2017 by xi'an

The riddle this week on The Riddler was about finding the largest sequence of integers between 1 and 100 such that each integer is only used once and always followed by a multiple or a factor. A basic R code searching at random [and programmed during a massive downpour on Skye] led to a solution of 69:

although there is no certainty this is the best p… And the solutions posted the next week showed sequences with length 77! [Interestingly, both posted solutions have a sequence starting with 87. And they seem to exploit the graph of connections between integers in a much more subtle way that my random exploration of subsequences.]

Le Monde puzzle [#1707]

Posted in Books, Kids, R with tags , , , , , on July 28, 2017 by xi'an

A geometric Le Monde mathematical puzzle:

  1. Given a pizza of diameter 20cm, what is the way to cut it by two perpendicular lines through a point distant 5cm from the centre towards maximising the surface of two opposite slices?
  2.  Using the same point as the tip of the four slices, what is the way to make four slices with equal arcs in four cuts from the tip again towards maximising the surface of two opposite slices?

For both questions, I did not bother with the maths but went itself to a discretisation of the disk, counting the proportion of points within two opposite slices and letting the inclination of these slices move from zero to π/2. Unsurprisingly, for the first question, the answer is π/4, given that there is no difference between both surfaces at angles 0 and π/2. My R code is as follows, using (5,0) as the tip:

M=100
surfaz=function(alpha){
surfz=0
cosal=cos(alpha);sinal=sin(alpha)
X=Y=seq(-10,10,le=M)
Xcosal=(X-5)*cosal
Xsinal=(X-5)*sinal
for (i in 1:M){
norm=sqrt(X[i]^2+Y^2)
scal1=Xsinal[i]+Y*cosal
scal2=-Xcosal[i]+Y*sinal
surfz=surfz+sum((norm<=10)*(scal1*scal2>0))}
return(4*surfz/M/M/pi)}

The second puzzle can be solved by a similar code, except that the slice area between two lines has to be determined by a cross product:

surfoz=function(alpha,ploz=FALSE){
  sinal=sin(alpha);cosal=cos(alpha)
  X=Y=seq(-10,10,le=M)
  frsterm=cosal*(10*cosal-5)+sinal*(10*sinal-5)
  trdterm=cosal*(10*cosal+5)+sinal*(10*sinal+5)
  surfz=0
  for (i in 1:M){
    norm=sqrt(X[i]^2+Y^2)
    scal1=(10*(Y[i]-5)*cosal-(10*sinal-5)*X)*frsterm
    scal2=-(-10*(Y[i]-5)*sinal-(10*cosal-5)*X)*frsterm
    scal3=(-10*(Y[i]-5)*cosal+(10*sinal+5)*X)*trdterm
    scal4=-(10*(Y[i]-5)*sinal+(10*cosal+5)*X)*trdterm
    surfz=surfz+sum((norm<=10)* 
    ((scal1>0)*(scal2>0)+
     (scal3>0)*(scal4>0)))}
 return(4*surfz/M/M/pi)}

a code that shows that all cuts lead to identical surfaces for bot sets of slices. A fairly surprising result!

 

RNG impact on MCMC [or lack thereof]

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , on July 13, 2017 by xi'an

Following the talk at MCM 2017 about the strange impact of the random generator on the outcome of an MCMC generator, I tried in Montréal airport the following code on the banana target of Haario et al. (1999), copied from Soetaert and Laine and using the MCMC function of the FME package:

library(FME)
Banana <- function (x1, x2) {
 return(x2 - (x1^2+1)) }
pmultinorm <- function(vec, mean, Cov) {
 diff <- vec - mean
 ex <- -0.5*t(diff) %*% solve(Cov) %*% diff
 rdet <- sqrt(det(Cov))
 power <- -length(diff)*0.5
 return((2.*pi)^power / rdet * exp(ex)) }
BananaSS <- function (p) {
 P <- c(p[1], Banana(p[1], p[2]))
 Cov <- matrix(nr = 2, data = c(1, 0.9, 0.9, 1))
N=1e3
ejd=matrix(0,4,N)
RNGkind("Mars")
for (t in 1:N){
  MCMC <- modMCMC(f = BananaSS, p = c(0, 0.7), 
  jump = diag(nrow = 2, x = 5), niter = 1e3)
  ejd[1,t]=mean((MCMC$pars[-1,2]-MCMC$pars[1,2])^2)}

since this divergence from the initial condition seemed to reflect the experiment of the speaker at MCM 2017. Unsurprisingly, no difference came from using the different RNGs in R (which may fail to contain those incriminated by the study)…

easy riddle

Posted in Books, Kids, R with tags , , , , , on July 12, 2017 by xi'an

From the current Riddler, a problem that only requires a few lines of code and a few seconds of reasoning. Or not.

N households each stole the earnings from one of the (N-1) other households, one at a time. What is the probability that a given household is not burglarised? And what are the expected final earnings of each household in the list, assuming they all start with $1?

The first question is close to Feller’s enveloppe problem in that

\left(1-\frac{1}{N-1}\right)^{N-1}

is close to exp(-1) for N large. The second question can easily be solved by an R code like

N=1e3;M=1e6
fina=rep(1,N)
for (v in 1:M){
 ordre=sample(1:N)
 vole=sample(1:N,N,rep=TRUE)
 while (min(abs(vole-(1:N)))==0)
  vole[abs(vole-(1:N))==0]=sample(1:N,
     sum(vole-(1:N)==0))
 cash=rep(1,N)
 for (t in 1:N){
  cash[ordre[t]]=cash[ordre[t]]+cash[vole[t]];cash[vole[t]]=0}
 fina=fina+cash[ordre]}

which returns a pretty regular exponential-like curve, although I cannot figure the exact curve beyond the third burglary. The published solution gives the curve

{\frac{N-2}{N-1}}^{999}\times 2+{\frac{1}{N-1}}^{t-1}\times{\frac{N-1}{N}}^{N-t}\times\frac{N}{N-1}

corresponding to the probability of never being robbed (and getting on average an extra unit from the robbery) and of being robbed only before robbing someone else (with average wealth N/(N-1)).

[un]solved riddles

Posted in Books, Kids, R with tags , , on July 4, 2017 by xi'an

On the Riddler of last week, first a birthday puzzle:

Given a group of 23 persons, what is the probability of observing three pairs of identical birthdays?

which can be found by a quick simulation as

ave=0
for (t in 1:1e6){
 dupz=dates[duplicated(sample(1:365,23,rep=TRUE))]
 ave=ave+as.integer((length(dupz)==3)&
     (length(unique(dupz))==3))}}
ave/M

returning a value of 0.0183, but which combinatoric resolution I could not fully fathom without a little help from a friend (-ly blog). I had the multinomial coefficient

{23\choose 2\ 2\ 2}

for the allocation of the 23 persons to one of the three pairs or none, as well as the probability

\dfrac{365\times\cdots\times(365-(23-4)+1)}{365^{23}}

but I had forgotten the 3! in the denominator for the permutations of the three pairs, which leads again to

{23 \choose 2\ 2\ 2}\dfrac{365\times\cdots\times(365-(23-4)+1)}{3!\times365^{23}} = 0.0183

A question that also led to an unsolved question: in the even this probability was much smaller, what is an easy way into constructing a more efficient importance sampler?

The second riddle was just as easy to code in R:

A game of tag goes by the following rules: (i) anyone untagged can tag anyone untagged; (ii) anyone tagged by a player tagged gets untagged; (iii) the winner is the last untagged player. What is the expected number of runs for N players?

The outcome of

game=function(N=12){
  a=rep(0,N);T=0
  while (sum(a==0)>1){
   ij=sample((1:N)[a==0],2)
   a[ij[2]]=ij[1];a[a==ij[2]]=0
   T=T+1}
  return(T)}

leads to an average value of

2^{N-1}-1

but I had no clear quick explanation for the doubling phenomenon. Until I picked a pen and a sheet of paper and drew the last steps of the game: to decrease down to 1, the size of the untagged survivors has to get through …,3,2 and each time the eliminated player needs to have tagged no other player since otherwise the population grows again. This has to apply all the way to the second round, where N-1 players remain and the one tagged needs to be anyone but the one who tagged the first one. And so on…

thinning a Markov chain, statistically

Posted in Books, pictures, R, Statistics with tags , , , , , , on June 13, 2017 by xi'an

Art Owen has arXived a new version of his thinning MCMC paper, where he studies how thinning or subsampling can improve computing time in MCMC chains. I remember quite well the message set by Mark Berliner and Steve MacEachern in an early 1990’s paper that subsampling was always increasing the variance of the resulting estimators. We actually have this result in our Monte Carlo Statistical Methods book. Now, there are other perspectives on this, as for instance cases when thinning can be hard-wired by simulating directly a k-step move, delaying rejection or acceptance, prefetching, or simulating directly the accepted values as in our vanilla Rao-Blackwellisation approach. Here, Art considers the case when there is a cost θ of computing a transform of the simulation [when the transition cost a unit] and when those transforms are positively correlated with correlation ρ. Somewhat unsurprisingly, when θ is large enough, thinning becomes worth implementing. But requires extra computations in evaluating the correlation ρ and the cost θ, which is rarely comparable with the cost of computing the likelihood itself, a requirement for the Metropolis-Hastings or Hamiltonian Monte Carlo step(s).  Subsampling while keeping the right target (which is a hard constraint!) should thus have a much more effective impact on computing budgets.

datazar

Posted in R, Statistics, University life with tags , , , , , , , on June 4, 2017 by xi'an

A few weeks ago and then some, I [as occasional blogger!] got contacted by datazar.com to write a piece on this data-sharing platform. I then went and checked what this was all about, having the vague impression this was a platform where I could store and tun R codes, besides dropping collective projects, but from what I quickly read, it sounds more like being able to run R scripts from one’s machine using data and code stored on datazar.com. But after reading just one more blog entry I finally understood it is also possible to run R, SQL, NotebookJS (and LaTeX) directly on that platform, without downloading code or data to one’s machine. Which makes it a definitive plus with this site, as users can experiment with no transfer to their computer. Hence on a larger variety of platforms. While personally I do not [yet?] see how to use it for my research or [limited] teaching, it seems like an [yet another] interesting exploration of the positive uses of Internet to collaborate and communicate on scientific issues! With no opinion on privacy and data protection offered by the site, of course.