## R-ecap [-16]

Posted in Books, R, University life with tags , , , , , , , on March 27, 2011 by xi'an

This morning, I noticed that none of my R related posts had appeared on R-bloggers for the past fortnight… After investigating, this was caused by…cut-and-paste! Indeed, when advertising about the special issue of TOMACS Arnaud Doucet and I edit about Monte Carlo methods in Statistics, I copied the main parts from the pdf announcement, straight out of Acrobat, and the word “field” was used, involving a ligature between the f and the i that did not get copied in proper UTF-8:

error on line 434 at column 330: Input is not proper UTF-8, indicate encoding ! Bytes: 0x0C 0×65 0x6C 0×64

So here are the entries in the ‘Og for the past 16 days that could have been of interest for R-bloggers readers:

## #2 blog for the statistics geek?!

Posted in Uncategorized with tags , , , , , on January 24, 2011 by xi'an

Julien pointed out to me that the ‘Og was #2 in this list of “40 fascinating blogs for the ultimate statistics geek“… Dunno how to take it! I also note Statisfaction ranked as #4 and Freakanometrics as #5, which sounds like the ranking is a wee haphazard, the latter blog having at least four times as much trafffic as the ‘Og and focussing solely on statistics, acturial science, R programming, and related scientific questions! (Still, verrry nice to make it to a list!)

## Asher’s enigma

Posted in R, Statistics with tags , , , on July 26, 2010 by xi'an

On his Probability and statistics blog, Matt Asher put a funny question (with my rephrasing):

Take a unit square. Now pick two spots at random along the perimeter, uniformly. For each of these two locations, pick another random point from one of the three other sides of the square and draw the segment. What is the probability the two segments intersect? And what is the distribution for the intersection points?

The (my) intuition for the first question was 1/2, but a quick computation led to another answer. The key to the computation is to distinguish whether or not both segments share one side of the square. They do with probability

$\dfrac{2}{4}\times 1 + \dfrac{2}{4}\times\dfrac{2}{3} = \dfrac{5}{6},$

in which case they intersect with probability 1/2. They occupy the four sides with probability 1/6, in which case they intersect with probability 1/3. So the final answer is 17/36 (as posted by several readers and empirically found by Matt). The second question is much more tricky: the histogram of the distribution of the coordinates is peaked towards the boundaries, thus reminding me of an arc-sine distribution, but there is a bump in the middle as well. Computing the coordinates of the intersection depending on the respective positions of the endpoints of both segments and simulating those distributions led me to histograms that looked either like beta B(a,a) distributions, or like beta B(1,a) distributions, or like beta B(a,1) distributions… Not exactly, though. So not even a mixture of beta distributions is enough to explain the distribution of the intersection points… For instance, the intersection points corresponding to segments were both segments start from the same side and end up in the opposite side are distributed as

$\dfrac{u_1(u_4-u_3)-u_3(u_2-u_1)}{u_4-u_3-u_2+u_1}$

where all u‘s are uniform on (0,1) and under the constraint $(u_2-u_1)(u_4-u_3)<0$. The following graph shows how well a beta distribution fits in that case. (Not perfectly, though!)
The R code is

u=matrix(runif(4*10^5),ncol=4)
u[,c(1,3)]=t(apply(u[,c(1,3)],1,sort))
u[,c(2,4)]=-t(apply(-u[,c(2,4)],1,sort))
y=(u[,1]*(u[,4]-u[,3])-u[,3]*(u[,2]-u[,1]))/(u[,1]+u[,4]-u[,2]-u[,3])

Similarly, if the two segments start from the same side but end up on different sides, the distribution of one coordinate is given by

$\dfrac{u_1(1-u_3)-u_3u_4(u_2-u_1)}{1-u_3-u_4(u_2-u_1)}$

under the constraint $u_3. The outcome is once again almost distributed as a beta:
The corresponding R code is

u=matrix(runif(4*10^5),ncol=4)
u[,c(1,3)]=-t(apply(-u[,c(1,3)],1,sort))
y=(u[,1]*(1-u[,3])-u[,3]*u[,4]*(u[,2]-u[,1]))/(1-u[,3]-u[,4]*(u[,2]-u[,1]))

## A quantum leap (CoRe in CiRM [4])

Posted in Books, R, University life with tags , , , , , , , , , , , on July 13, 2010 by xi'an

Today, as I was trying to install SpatialEpi to use the Scotland lip cancer data in the last chapter of Bayesian Core, I realised my version of R, R Version 2.6.1, was hopelessly out of date! As I am also using Hardy Heron, a somehow antiquated version of Ubuntu on my Mac, upgrading R took some effort as well. I eventually found that adding the line

deb http://cran.cict.fr/bin/linux/ubuntu hardy/

in /etc/apt/sources.list worked nicely. So I now moved two years forward in time!!! On top is my first attempt at plotting the dataset with my modified version of mapvariable. As it happens, another blog appeared today on R-bloggers about color gradients using ggplot2.

## A repulsive random walk

Posted in R, Statistics with tags , , , , on May 28, 2010 by xi'an

Matt Asher posted an R experiment on R-bloggers yesterday simulating the random walk

$x_{t+1} = x_t + \varepsilon_t / x_t$

which has the property of avoiding zero by quickly switching to a large value as soon as $x_t$ is small. He was then wondering about the “convergence” of the random walk given that it moves very little once $x_t$ is large enough. The values he found for various horizons t seemed to indicate a stable regime.

I reran the same experiment as Matt in a Monte Carlo perspective, using the R program

resu=matrix(0,ncol=100,nrow=25)
sampl=rnorm(100)
for (i in 1:25){
for (t in 2^(i-1):2^i) sampl=sampl+rnorm(100)/sampl
resu[i,]=sampl
}
boxplot(as.data.frame(t(abs(resu))),name=as.character(1:25),col="wheat3")

The outcome of this R code plotted above shows that the range and the average of the 100 replications is increasing with t. This behaviour indicates a transient behaviour of the Markov chain, which almost surely goes to infinity and never comes back (because at infinity the variance is zero). Another indication for transience is shown by the fact that $x_t$ comes back to the interval (-1,1) with probability $\Phi(-|x_t|)$, a probability which goes to zero with $x_t$. As suggested to me by Randal Douc, this transience can be established rigorously by considering

$x_{t+1}^2 = x_t^2 + 2\epsilon_t + \epsilon_t^2/x_t^2 > x_t^2 + 2\epsilon_t>2\sum_{i=1}^t \epsilon_t$

which is thus bounded from below by a null recurrent process, which almost surely goes to infinity. Therefore the above Markov chain cannot have a stationary distribution or even a stationary measure: it almost surely goes to (plus or minus) infinity.