Archive for R

a riddle at the end of its tether

Posted in R, Statistics with tags , , on February 24, 2017 by xi'an

A simply worded riddle this week on The Riddler, about four ropes having non-uniform and unknown burning rates, the only constraint being they all burn completely in one hour. With the help of a lighter (or even a single match), what are the possible units of time one can measure by burning them?

While I had worked out a range of possible times for each of the ropes to complete its combustion, I went for a simulation based confirmation. The starting point is that a new fire can only be started when a burning rope ends up burning. Which only happens for a finite number of possibilities. I found 24 of them, consisting of

> total*prec
[1]  0.000 0.5000 0.750 0.875 0.9375 1.000 1.125 1.1875 1.25 1.3125
[11] 1.375 1.4375 1.500 1.625 1.7500 1.875 2.000 2.1250 2.25 2.5000
[21] 2.750 3.0000 3.500 4.000

i.e., some combinations of 1, 2⁻¹, …, 2⁻⁴, with the comment that those times cannot all be obtained within a single arson experiment.

Continue reading

a knapsack riddle?

Posted in Books, pictures, R, Statistics, Travel with tags , , , , , , on February 13, 2017 by xi'an

gear

The [then current now past] riddle of the week is a sort of multiarmed bandits optimisation. Of sorts. Or rather a generalised knapsack problem. The question is about optimising the allocation of 100 undistinguishable units to 10 distinct boxes against a similarly endowed adversary, when the loss function is

L(x,y)=(x_1>y_1)-(x_1<y_1)+...+10((x_{10}>y_{10})-(x_{10}<y_{10}))

and the distribution q of the adversary is unknown. As usual (!), the phrasing of the riddle is somewhat ambiguous but I am under the impression that the game is played sequentially, hence that one can learn about the distribution of the adversary, at least when assuming this adversary keeps the same distribution q at all times. Continue reading

an accurate variance approximation

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , on February 7, 2017 by xi'an

In answering a simple question on X validated about producing Monte Carlo estimates of the variance of estimators of exp(-θ) in a Poisson model, I wanted to illustrate the accuracy of these estimates against the theoretical values. While one case was easy, since the estimator was a Binomial B(n,exp(-θ)) variate [in yellow on the graph], the other one being the exponential of the negative of the Poisson sample average did not enjoy a closed-form variance and I instead used a first order (δ-method) approximation for this variance which ended up working surprisingly well [in brown] given that the experiment is based on an n=20 sample size.

Thanks to the comments of George Henry, I stand corrected: the variance of the exponential version is easily manageable with two lines of summation! As

\text{var}(\exp\{-\bar{X}_n\})=\exp\left\{-n\theta[1-\exp\{-2/n\}]\right\}

-\exp\left\{-2n\theta[1-\exp\{-1/n\}]\right\}

which allows for a comparison with its second order Taylor approximation:

compar

a well-hidden E step

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , on February 3, 2017 by xi'an

Grand Palais from Esplanade des Invalides, Paris, Dec. 07, 2012A recent question on X validated ended up being quite interesting! The model under consideration is made of parallel Markov chains on a finite state space, all with the same Markov transition matrix, M, which turns into a hidden Markov model when the only summary available is the number of chains in a given state at a given time. When writing down the EM algorithm, the E step involves the expected number of moves from a given state to a given state at a given time. The conditional distribution of those numbers of chains is a product of multinomials across times and starting states, with no Markov structure since the number of chains starting from a given state is known at each instant. Except that those multinomials are constrained by the number of “arrivals” in each state at the next instant and that this makes the computation of the expectation intractable, as far as I can see.

A solution by Monte Carlo EM means running the moves for each instant under the above constraints, which is thus a sort of multinomial distribution with fixed margins, enjoying a closed-form expression but for the normalising constant. The direct simulation soon gets too costly as the number of states increases and I thus considered a basic Metropolis move, using one margin (row or column) or the other as proposal, with the correction taken on another margin. This is very basic but apparently enough for the purpose of the exercise. If I find time in the coming days, I will try to look at the ABC resolution of this problem, a logical move when starting from non-sufficient statistics!

a typo that went under the radar

Posted in Books, R, Statistics, University life with tags , , , , , , , on January 25, 2017 by xi'an

A chance occurrence on X validated: a question on an incomprehensible formula for Bayesian model choice: which, most unfortunately!, appeared in Bayesian Essentials with R! Eeech! It looks like one line in our LATEX file got erased and the likelihood part in the denominator altogether vanished. Apologies to all readers confused by this nonsensical formula!

Le Monde puzzle [#990]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , on January 12, 2017 by xi'an

To celebrate the new year (assuming it is worth celebrating!), Le Monde mathematical puzzle came up with the following:

Two sequences (x¹,x²,…) and (y¹,y²,…) are defined as follows: the current value of x is either the previous value or twice the previous value, while the current value of y is the sum of the values of x up to now. What is the minimum number of steps to reach 2016 or 2017?

By considering that all consecutive powers of 2 must appear at least one, the puzzles boils down to finding the minimal number of replications in the remainder of the year minus the sum of all powers of 2. Which itself boils down to deriving the binary decomposition of that remainder. Hence the basic R code (using intToBits):

deco=function(k=2016){
 m=trunc(log2(k))
 while (sum(2^(0:m))>k) m=m-1
 if (sum(2^(0:m))==k){ return(rep(1,m+1))
 }else{
 res=k-sum(2^(0:m))
 return(rep(1,m+1)+as.integer(intToBits(res))[1:(m+1)])

which produces

> sum(deco(2016))
[1] 16
> sum(deco(2017))
[1] 16
> sum(deco(1789))
[1] 18

a Galton-Watson riddle

Posted in R, Travel with tags , , , on December 30, 2016 by xi'an

riddleThe Riddler of this week has an extinction riddle which summarises as follows:

One observes a population of N individuals, each with a probability of 10⁻⁴ to kill the observer each day. From one day to the next, the population decreases by one individual with probability

K√N 10⁻⁴

What is the value of K that leaves the observer alive with probability ½?

Given the sequence of population sizes N,N¹,N²,…, the probability to remain alive is

(1-10^{-4})^{N+N^1+\ldots}

where the sum stops with the (sure) extinction of the population. Which is the moment generating function of the sum. At x=1-10⁻⁴. Hence the problem relates to a Galton-Watson extinction problem. However, given the nature of the extinction process I do not see a way to determine the distribution of the sum, except by simulation. Which returns K=27 for the specific value of N=9.

N=9
K=3*N
M=10^4
vals=rep(0,M)
targ=0
ite=1
while (abs(targ-.5)>.01){

for (t in 1:M){
  gen=vals[t]=N
  while (gen>0){
   gen=gen-(runif(1)<K*sqrt(gen)/10^4)
   vals[t]=vals[t]+gen}
  }
  targ=mean(exp(vals*log(.9999)))
print(c(as.integer(ite),K,targ))
  if (targ<.5){ K=K*ite/(1+ite)}else{
     K=K/(ite/(1+ite))}
  ite=ite+1}

The solution proposed on The Riddler is more elegant in that the fixed point equation is

\prod_{J=1}^9 \frac{K \cdot \sqrt{J}}{K \cdot \sqrt{J} + J}=\frac{1}{2}

with a solution around K=27.