## learning base R [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , on February 26, 2022 by xi'an

This second edition of an introductory R book was sent to me by the author for a potential CHANCE book review.  As there are many (many) books in the same spirit, the main question behind my reading it (in one go) was on the novelty it brings. The topics Learning Base R covers are

• arithmetics with R
• data structures
• built-in and user-written R functions
• R utilities
• more data structures
• comparison and coercion
• lists and data frames
• resident R datasets
• R interface
• probability calculations in R
• R graphics
• R programming
• simulations
• statistical inference in R
• linear algebra
• use of R packages

within as many short chapters. The style is rather standard, that is, short paragraphs with mostly raw reproductions of line commands and their outcome. Sometimes a whole page long of code examples (if with comments). All in all I feel there are rather too few tables when compared with examples, at least for my own taste. The exercises are mostly short and, while they vary in depth, they show that the book is rather intended for students with some mathematical background (e.g., with a chapter on complex numbers and another one on linear algebra that do not seem immediately relevant for most intended readers). Or more than that, when considering one (of several) exercise (19.30) on the Black-Scholes process that mentions Brownian motion. Possibly less appealing for would-be statisticians.

I also wonder at the pedagogical choice of not including and involving more clearly graphical interfaces like R studio as students are usually not big fans of “old-style” [their wording not mine!] line command languages. For instance, the chapter on packages would have benefited from this perspective. Nothing on Rmarkdown either. Apparently nothing on handling big data, more advanced database manipulation, the related realistic dangers of memory freeze and compulsory reboot, the intricacies of managing different directories and earlier sessions, little on the urgency of avoiding loops (p.233) by vectorial programming, a paradoxically if function being introduced after ifelse, and again not that much on statistics (with density only occurring in exercises).The chapter on customising R graphics may possibly scare the intended reader when considering the all-in-one example of p.193! As we advance though the book, the more advanced examples often are fairly standard programming ones (found in other language manuals) like creating Fibonacci numbers, implementing Eratosthenes sieve, playing the Hanoi Tower game… (At least they remind me of examples read in the language manuals I read as a student.) The simulation chapter could have gone into the one (Chap. 19) on probability calculations, rather than superfluously redefining standard distributions. (Except when defining a random number as a uniformly random number (p.162).)  This chapter also spends an unusual amount of space on linear congruencial pseudo-random generators, while missing to point out the trivia that the randu dataset mentioned twice earlier is actually an outcome from the infamous RANDU Fortran generator. The following section in that chapter is written in such a way that it may give the wrong impression that one can find the analytic solution from repeated Monte Carlo experiments and hence the error. Which is rarely the case, even in finite environments with rational expectations, as one usually does not know of which unit fraction the expectation should be a multiple of. (Remember the Squid Games paradox!) And no mention is made of the prescription of always returning an error estimate along with the numerical approximation. The statistics chapter is obviously more developed, with descriptive statistics, ecdf, but no bootrstap, a t.test curiously applied to the Michelson measurements of the speed of light (how could it be zero?!), ANOVA, regression handled via lm and glm, time series analysis by ARIMA models, which I hope will not be the sole exposure of readers to these concepts.

In conclusion, there is nothing critically wrong with this manual introducing R to newcomers and I would not mind having my undergraduate students reading it (rather than our shorter and home-made handout, polished along the years) before my first mathematical statistics lab. However I do not find it massively innovative in its presentation or choice of concept, even though the most advanced examples are not necessarily standard, and may not appeal to all categories of students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

## the Grumble distribution and an ODE

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on December 3, 2014 by xi'an

As ‘Og’s readers may have noticed, I paid some recent visits to Cross Validated (although I find this too addictive to be sustainable on a long term basis!, and as already reported a few years ago frustrating at several levels from questions asked without any preliminary personal effort, to a lack of background material to understand hints towards the answer, to not even considering answers [once the homework due date was past?], &tc.). Anyway, some questions are nonetheless great puzzles, to with this one about the possible transformation of a random variable R with density

$p(r|\lambda) = \dfrac{2\lambda r\exp\left(\lambda\exp\left(-r^{2}\right)-r^{2}\right)}{\exp\left(\lambda\right)-1}$

into a Gumble distribution. While the better answer is that it translates into a power law,

$V=e^{e^{-R^2}}\sim q(v|\lambda)\propto v^{\lambda-1}\mathbb{I}_{(1,e)}(v)$,

I thought using the S=R² transform could work but obtained a wrong sign in the pseudo-Gumble density

$W=S-\log(\lambda)\sim \eth(w)\propto\exp\left(\exp(-w)-w\right)$

and then went into seeking another transform into a Gumbel rv T, which amounted to solve the differential equation

$\exp\left(-e^{-t}-t\right)\text{d}t=\exp\left(e^{-w}-w\right)\text{d}w$

As I could not solve analytically the ODE, I programmed a simple Runge-Kutta numerical resolution as follows:

solvR=function(prec=10^3,maxz=1){
z=seq(1,maxz,le=prec)
t=rep(1,prec) #t(1)=1
for (i in 2:prec)
t[i]=t[i-1]+(z[i]-z[i-1])*exp(-z[i-1]+
exp(-z[i-1])+t[i-1]+exp(-t[i-1]))
zold=z
z=seq(.1/maxz,1,le=prec)
t=c(t[-prec],t)
for (i in (prec-1):1)
t[i]=t[i+1]+(z[i]-z[i+1])*exp(-z[i+1]+
exp(-z[i+1])+t[i+1]+exp(-t[i+1]))
return(cbind(c(z[-prec],zold),t))
}


Which shows that [the increasing] t(w) quickly gets too large for the function to be depicted. But this is a fairly useless result in that a transform of the original variable and of its parameter into an arbitrary distribution is always possible, given that  W above has a fixed distribution… Hence the pun on Gumble in the title.