**D**irk Kroese (from UQ, Brisbane) and Joshua Chen (from ANU, Canberra) just published a book entitled *Statistical Modeling and Computation*, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, *Statistical Modeling and Computation* is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our *Bayesian Essentials with R*, meaning that I am not the most appropriate or least ~~un~~biased reviewer for this book. Continue reading

## Archive for introductory textbooks

## Statistical modeling and computation [book review]

Posted in Books, R, Statistics, University life with tags ANU, Australia, Bayesian Essentials with R, Bayesian statistics, Brisbane, Dirk Kroese, introductory textbooks, Joshua Chen, Matlab, maximum likelihood estimation, Monte Carlo methods, Monte Carlo Statistical Methods, R, state space model on January 22, 2014 by xi'an## the cartoon introduction to statistics

Posted in Books, Kids, Statistics, University life with tags book review, cartoon, CHANCE, introductory textbooks, Statistics, textbooks on May 16, 2013 by xi'an**A** few weeks ago, I received a copy of The Cartoon Introduction to Statistics by Grady Klein and Alan Dabney, send by their publisher, Farrar, Staus and Giroux from New York City. (Never heard of this publisher previously, but I must admit the aggregation of those three names sounds great!) As this was an unpublished version of the book, to appear in July 2013, I first assumed my copy was a draft version, with black and white drawings using limited precision graphics.. However, when checking the already published Cartoon Introduction to Economics, I realised this was the style of Grady Klein (as reflected below).

**T**hus, I have to assume this is how The Cartoon Introduction to Statistics will look like when published in July… Actually, I received later a second copy of the definitive version, so I can guarantee this is the case. (Funny enough, there is a supportive quote of the author of Naked Statistics on the back-cover!) I am quite perplexed by the whole project. First, I do not see how a newcomer to the field can learn better from a cartoon with an average four sentences per page than from a regular introductory textbook. Cartoons introduce an element of fun into the explanation, with jokes and (irrelevant) side stories, but they are also distracting as readers are not always in a position to know what matters and what does not. Second, as the drawings are done in a rough style, I find this increases the potential for confusion. For instance, the above cover reproduces an example linking the histogram of a sample of averages and the normal distribution. If a reader has never heard of histograms, I do not see how he or she could gather how they are constructed in practice. The width of the bags is related to the number of persons in each bag (50 random Americans) in the story, while it should be related to the inverse of the square root of this number in the theory. Similarly, I find the explanation about confidence intervals lacking: when trying to reassure the readers about the fact that any given random sample from a population might be misleading, the authors state that “in the long run most cans [of worms] have averages in the clump under the hump [of the normal pdf]“. This is not reassuring at all: when using confidence intervals based on 10 or on 10⁵ normal observations, the corresponding 95% confidence intervals on their mean both have 95% chances to contain the true mean. The long run aspect refers to the repeated use of those intervals. (I am not even mentioning the classical fallacy of stating that “we are 99.7% confident that the population average is somewhere between -1.73 and -0.27″…)

**I**n conclusion, I remember buying an illustrated entry to Marx’ Das Kapital when I started economics in graduate school (as a minor). This gave me a very quick idea of the purpose of the book. However, I read through the whole book to understand (or try to understand) Marx’ analysis of the economy. And the introduction did not help much in this regard. In the present setting, we are dealing with statistics, not economics, not philosophy. Having read a cartoon about the average length of worms within a can of worms is not going to help much in understanding the Central Limit Theorem and the subsequent derivation of confidence intervals. The validation of statistical methods is done through mathematics, which provides a formal language cartoons cannot reproduce.

## a brief on naked statistics

Posted in Books, R, Statistics, University life with tags book review, general public, How to Lie with Statistics, India, introductory textbooks, masala chai, Naked Economics, Naked Statistics, Zen, Zeno's paradox on April 3, 2013 by xi'an**O**ver the last Sunday breakfast I went through *Naked Statistics: Stripping the Dread from the Data*. The first two pages managed to put me in a prejudiced mood for the rest of the book. To wit: the author starts with some math bashing (like, no one ever bothers to tell us about the uses of high school calculus!) either because he really feels like this or because it pays with the intended audience (like, we are on the same side, pal!), he then shows how he outsmarted his high school math teacher by spotting the exam was not possibly designed for his class and then another math teacher by just… re-inventing the steps leading to Zeno’s paradox (said Zeno of Elea not appearing in the credits of the book, to be sure) and sums it up with an NRA argument: *“statistics is like a high-caliber weapon: helpful when used correctly”* (p.xiv). Add to that a highly ethnocentric perspective that makes the book hardly readable for anyone outside the US, due to its absolute focus on all things American (exaggerating just a wee bit: *who are Lebron James, Kim Kardashian, and Dan Rather?! what is Netflix?! why’s this Donald Rumsfeld guy quoted throughout the book?! how do they play baseball?! What do NBA, NHL, and SAT stand for?!* *&tc.*)—as best illustrated by the facts that it took Charles Wheelan three months to realise a (golf) laser measuring instrument he had received could be in another unit that *feet*, namely *meters*!, and that he considers paying 100 rupees for a chai (मसाला चाय) in India a cheap price when this amount roughly corresponds to the average daily salary there…—. Top the whole thing with the fact that the author has already written a *Naked Economics* and seemingly found gold. (I am desperate for the incoming *Naked Paleopathology* tome in the series!) And there you get me stuck with such a highly negative *a priori* about *Naked Statistics* that I could not shake it off for the rest of the book.

“This book will not make you a statistical expert (…) This book is not a textbook.”(p.xv)

**W**ith this warning in mind about my bias, let’s get on with what’s in this book. The above tells us what isn’t. To quote further from the author, the book “*has been designed to introduce the statistical concepts with the most relevance to everyday life*“ (p.xv). *Naked Statistics* goes over the basic notions of statistics (mean, standard deviation, correlation, linear regression, testing, design, polling), gives a sprinkle of probability background (counting models and the central limit theorem, which Wheelan considers as part of statistics), and spend the remaining chapters warning the reader(s) about the possible missuses of models and statistical tools if implemented in the wrong situations or with the wrong type of data. (There are a few graphs, but they are not particularly inspiring.) All this done with the minimum amount of maths formulae, mostly hidden in footnotes and appendices. (But then why adding an extra formula for σ when one is given just before for σ²?!) Sometimes, the minimum is not enough, as demonstrated by the “formula for calculating the correlation coefficient” (p.61) which takes a whole page of text to get around this absurdity of not using maths symbols like Σ and concludes with the lame *“I’ll wave my hands and let the computer do the work”* (p.61)! Somehow surprisingly, given the low-key nature of the book, it includes a final appendix on statistical software. From Excel, to SAS, Stata, and …R! While I am pleased at this inclusion, it sounds very much orthogonal to the purpose and the intended audience of *Naked Statistics.* I cannot fathom anyone reading the book and then immediately embarking upon writing an R code without stopping by a statistics textbook or formal training. (Incidentally, the author reproduces the usual confusion between free and open source, p.259.) Continue reading

## [stack] overloaded, crossed and invalidated

Posted in Books, Kids, Statistics, University life with tags Bayes theorem, cross validated, fish, forum, introductory textbooks, Og, StackExchange on November 21, 2011 by xi'an**F**or the few past days, I have been monitoring **Cross Validated**, a forum on statistics that is part of **StackExchange** (“*a fast-growing network of 71 question and answer sites on diverse topics from software programming to cooking to photography and gaming*“…) for questions of interest, but I think (hope?) I will stop there my involvement. First, I fear this type of forum is too addictive (at least for me, and I already have my share of web-related addictions, witness this very blog!). Of course, **Cross Validated** is an interesting site for questions related with statistics and machine learning, with a great L_{A}T_{E}X interface that allows to type math formulas in a natural way; however I also find the exercise rather frustrating and to some extend futile, which is another reason why I do not wish pursuing the experience any further. For one thing, some of the questions found there are of the “please do my homework for me” type and I am plagued with enough emails of this sort (connected with my own exercises) to look for further hassle. They are however easy to spot and thus eliminate. For another, I suspect a majority of questions, while honest and deep enough, are often asked at the spur of the moment, i.e. without a preliminary search on a paper or online source, like Wikipedia or a textbook. E.g., a question about Bayes theorem that brought decent answers but not further than the Wikipedia entry on the topic. At one level, I would like to give in to temptation and to answer questions I feel I have a valid and informative answer. However, it does not seem like an efficient use of my time (read my books instead!) And also I am not completely convinced this fundamentally helps the persons who ask the questions in the first place. What may lie at the bottom of my unease with being involved in such forums is the sad fact that most questioners want answers without getting through the necessary steps of learning the bases and the background theory surrounding the question. While not being a teacher at heart, this approach gets against my views on learning (“Give a man a fish, &tc.”). Observations towards this view are that (a) many questioners are “one-shot” occurrences, i.e. are never seen again on the forum and (b) such questioners often fail to acknowledge answers that are not posted immediately, i.e. are not really interested in the debate surrounding the question they asked in the first place, only in someone solving their problem for them…

*Anyway, this post is my very personal opinion on why I should not get involved with Q&A sites: it does not aim at criticising people asking or answering questions on Cross Validated, quite clearly, as some questions may lead to interesting research developments and as some answers are well-though, helpful, and informative. Just not my ballpark!*

## The foundations of Statistics [reply]

Posted in Books, R, Statistics, University life with tags blog, foundations, introductory textbooks, linguistics, mathematics, R, simulation, Statistics Forum on July 19, 2011 by xi'an**S**hravan Vasishth has written a response to my review both published on the Statistics Forum. His response is quite straightforward and honest. In particular, he acknowledges not being a statistician and that he “should spend more time studying statistics”. I also understand the authors’ frustration at trying “to recruit several statisticians (at different points) to join [them] as co-authors for this book, in order to save [them] from [them]selves, so to speak. Nobody was willing to do join in.” (Despite the kind proposal to join as a co-author to a new edition, I would be rather unwilling as well, mostly because of the concept to avoid calculus at all cost… I will actually meet with Shravan at the end of the month to discuss specifics of the statistical flaws in this book.)

**H**owever, I still do not understand why the book was published without a proper review from a statistician. Springer is a/my serious scientific editor and book proposals usually go through several reviews, prior to and after redaction. Shravan Vasishth asks for alternative references, which I personally cannot provide for lack of teaching at this level, but this is somehow besides the point: even if a book at the intended level and for the intended audience did not exist, this would not justify the publication of a book on statistics (and only statistics) by authors not proficient enough in the topic.

**O**ne point of the response I do not get is the third item about the blog and letting my “rage get the better of [myself] (the rage is no doubt there for good reason)”. Indeed, while I readily acknowledge the review is utterly negative, I have tried to stick to facts, either statistical flaws (like the unbiasedness of *s*) or presentation defects. The reference to a blog in the book could be a major incentive to adopt the book, so if the blog does not live as a blog, it is both a disappointment to the reader and a sort of a breach of advertising. I perfectly understand the many reasons for not maintaining a blog (!), but then the site should have been advertised as a site rather than a blog. This was the meaning of the paragraph

The authors advertise a blog about the book that contains very little information. (The last entry is from December 2010: “The book is out”.) This was a neat idea, had it been implemented.

that does not sound full of rage to me… Anyway, this is a minor point.