## around the table

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , on December 2, 2020 by xi'an

The Riddler has a variant on the classical (discrete) random walk around a circle where every state (but the starting point) has the same probability 1/(n-1) to be visited last. Surprising result that stems almost immediately from the property that, leaving from 0, state a is visited couterclockwise before state b>a is visited clockwise is b/a+b. The variant includes (or seems to include) the starting state 0 as counting for the last visit (as a return to the origin). In that case, all n states, including the origin, but the two neighbours of 0, 1, and n-1, have the same probability to be last. This can also be seen on an R code that approximates (inner loop) the probability that a given state is last visited and record how often this probability is largest (outer loop):

```w=0*(1:N)#frequency of most likely last
for(t in 1:1e6){
o=0*w#probabilities of being last
for(v in 1:1e6)#sample order of visits
o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),300,rep=T))%%N)[N]]+1
w[j]=w[j<-order(o)[N]]+1}
```

However, upon (jogging) reflection, the double loop is a waste of energy and

```o=0*(1:N)
for(v in 1:1e8)
o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),500,rep=T))%%N)[N]]+1
```

should be enough to check that all n positions but both neighbours have the same probability of being last visited. Removing the remaining loop should be feasible by considering all subchains starting at one of the 0’s, since this is a renewal state, but I cannot fathom how to code it succinctly. A more detailed coverage of the original problem (that is, omitting the starting point) was published the Monday after publication of the riddle on R bloggers, following a blog post by David Robinson on Variance Explained.

R codegolf challenge: is there a way to shorten the above R for loop in a single line command?!

## a perfectly normally distributed sample

Posted in R, Statistics with tags , , , , , , , , on May 9, 2019 by xi'an

When I saw this title on R-bloggers, I was wondering how “more perfect” a Normal sample could be when compared with the outcome of rnorm(n). Hence went checking the original blog on `bayestestR` in search of more information. Which was stating nothing more than how to generate a sample is perfectly normal by using the `rnorm_perfect` function. Still unsure of the meaning, I contacted one of the contributors who replied very quickly

…that’s actually a good question. I would say an empirical sample having characteristics as close as possible to a cannonic gaussian distribution.
and again leaving me hungering for more details. I thus downloaded the package `bayestestR` and opened the `rnorm_perfect` function. Which is simply the sequence of n-quantiles
stats::qnorm(seq(1/n, 1 – 1/n, length.out = n), mean, sd)
which I would definitely not call a sample as it has nothing random. And perfect?! Not really, unless one associates randomness and imperfection.

## an improvable Rao–Blackwell improvement, inefficient maximum likelihood estimator, and unbiased generalized Bayes estimator

Posted in Books, Statistics, University life with tags , , , , , , , , on February 2, 2018 by xi'an

In my quest (!) for examples of location problems with no UMVU estimator, I came across a neat paper by Tal Galili [of R Bloggers fame!] and Isaac Meilijson presenting somewhat paradoxical properties of classical estimators in the case of a Uniform U((1-k)θ,(1+k)θ) distribution when 0<k<1 is known. For this model, the minimal sufficient statistic is the pair made of the smallest and of the largest observations, L and U. Since this pair is not complete, the Rao-Blackwell theorem does not produce a single and hence optimal estimator. The best linear unbiased combination [in terms of its variance] of L and U is derived in this paper, although this does not produce the uniformly minimum variance unbiased estimator, which does not exist in this case. (And I do not understand the remark that

“Any unbiased estimator that is a function of the minimal sufficient statistic is its own Rao–Blackwell improvement.”

as this hints at an infinite sequence of improvement.) While the MLE is inefficient in this setting, the Pitman [best equivariant] estimator is both Bayes [against the scale Haar measure] and unbiased. While experimentally dominating the above linear combination. The authors also argue that, since “generalized Bayes rules need not be admissible”, there is no guarantee that the Pitman estimator is admissible (under squared error loss). But given that this is a uni-dimensional scale estimation problem I doubt very much there is a Stein effect occurring in this case.

## datazar

Posted in R, Statistics, University life with tags , , , , , , , on June 4, 2017 by xi'an

A few weeks ago and then some, I [as occasional blogger!] got contacted by datazar.com to write a piece on this data-sharing platform. I then went and checked what this was all about, having the vague impression this was a platform where I could store and tun R codes, besides dropping collective projects, but from what I quickly read, it sounds more like being able to run R scripts from one’s machine using data and code stored on datazar.com. But after reading just one more blog entry I finally understood it is also possible to run R, SQL, NotebookJS (and LaTeX) directly on that platform, without downloading code or data to one’s machine. Which makes it a definitive plus with this site, as users can experiment with no transfer to their computer. Hence on a larger variety of platforms. While personally I do not [yet?] see how to use it for my research or [limited] teaching, it seems like an [yet another] interesting exploration of the positive uses of Internet to collaborate and communicate on scientific issues! With no opinion on privacy and data protection offered by the site, of course.

## R for dummies

Posted in Books, R, Statistics, University life with tags , , , , , , , , on October 20, 2012 by xi'an

Just saw this nice review of R for dummies. And thought after this afternoon class that my students in the simulation course at Paris-Dauphine could clearly benefit from reading it! They in fact had a terrible time simulating a truncated normal distribution by accept-reject. As they could not get the notion of normalising constants… (Yes, indeed, this very truncated normal distribution!) Even the validity of simulating a normal variate until the truncation is satisfied was not obvious to them and they took forever to program the corresponding code. Anyway, I will certainly order the book to check for myself (after receiving Genetics for dummies to make sure I use the right vocabulary, even though it is a bit too light in the end…)! And write a review for CHANCE if it generates enough interest in doing so…