## another R new trick [new for me!]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , on July 16, 2014 by xi'an

While working with Andrew and a student from Dauphine on importance sampling, we wanted to assess the distribution of the resulting sample via the Kolmogorov-Smirnov measure

$\max_x |\hat{F_n}(x)-F(x)|$

where F is the target.  This distance (times √n) has an asymptotic distribution that does not depend on n, called the Kolmogorov distribution. After searching for a little while, we could not figure where this distribution was available in R. It had to, since ks.test was returning a p-value. Hopefully correct! So I looked into the ks.test function, which happens not to be entirely programmed in C, and found the line

PVAL <- 1 - if (alternative == "two.sided")
.Call(C_pKolmogorov2x, STATISTIC, n)


which means that the Kolmogorov distribution is coded as a C function C_pKolmogorov2x in R. However, I could not call the function myself.

> .Call(C_pKolmogorov2x,.3,4)


Hence, as I did not want to recode this distribution cdf, I posted the question on stackoverflow (long time no see!) and got a reply almost immediately as to use the package kolmim. Followed by the extra comment from the same person that calling the C code only required to add the path to its name, as in

> .Call(stats:::C_pKolmogorov2x,STAT=.3,n=4)
[1] 0.2292


## implementing reproducible research [short book review]

Posted in Books, Kids, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , on July 15, 2014 by xi'an

As promised, I got back to this book, Implementing reproducible research (after the pigeons had their say). I looked at it this morning while monitoring my students taking their last-chance R exam (definitely last chance as my undergraduate R course is not reconoduced next year). The book is in fact an edited collection of papers on tools, principles, and platforms around the theme of reproducible research. It obviously links with other themes like open access, open data, and open software. All positive directions that need more active support from the scientific community. In particular the solutions advocated through this volume are mostly Linux-based. Among the tools described in the first chapter, knitr appears as an alternative to sweave. I used the later a while ago and while I like its philosophy. it does not extend to situations where the R code within takes too long to run… (Or maybe I did not invest enough time to grasp the entire spectrum of sweave.) Note that, even though the book is part of the R Series of CRC Press, many chapters are unrelated to R. And even more [unrelated] to statistics.

This limitation is somewhat my difficulty with [adhering to] the global message proposed by the book. It is great to construct such tools that monitor and archive successive versions of code and research, as anyone can trace back the research steps conducting to the published result(s). Using some of the platforms covered by the book establishes for instance a superb documentation principle, going much further than just providing an “easy” verification tool against fraudulent experiments. The notion of a super-wiki where notes and preliminary versions and calculations (and dead ends and failures) would be preserved for open access is just as great. However this type of research processing and discipline takes time and space and human investment, i.e. resources that are sparse and costly. Complex studies may involve enormous amounts of data and, neglecting the notions of confidentiality and privacy, the cost of storing such amounts is significant. Similarly for experiments that require days and weeks of huge clusters. I thus wonder where those resources would be found (journals, universities, high tech companies, …?) for the principle to hold in full generality and how transient they could prove. One cannot expect the research time to garantee availability of those meta-documents for remote time horizons. Just as a biased illustration, checking the available Bayes’ notebooks meant going to a remote part of London at a specific time and with a preliminary appointment. Those notebooks are not available on line for free. But for how long?

“So far, Bob has been using Charlie’s old computer, using Ubuntu 10.04. The next day, he is excited to find the new computer Alice has ordered for him has arrived. He installs Ubuntu 12.04″ A. Davison et al.

Putting their principles into practice, the authors of Implementing reproducible research have made all chapters available for free on the Open Science Framework. I thus encourage anyone interesting in those principles (and who would not be?!) to peruse the chapters and see how they can benefit from and contribute to open and reproducible research.

## ABC in Cancún

Posted in Books, Kids, R, Statistics, Travel, University life with tags , , , , , , , , on July 11, 2014 by xi'an

Here are our slides for the ABC [very] short course Jean-Michel and I give at ISBA 2014 in Cancún next Monday (if your browser can manage Slideshare…) Although I may switch the pictures from Iceland to Mexico, on Sunday, there will be not much change on those slides we both have previously used in previous short courses. (With a few extra slides borrowed from Richard Wilkinson’s tutorial at NIPS 2013!) Jean-Michel will focus his share of the course on software implementations, from R packages like abc and abctools and our population genetics software DIYABC. With an illustration on SNPs data from pygmies populations.

## Bayes’ Rule [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , on July 10, 2014 by xi'an

This introduction to Bayesian Analysis, Bayes’ Rule, was written by James Stone from the University of Sheffield, who contacted CHANCE suggesting a review of his book. I thus bought it from amazon to check the contents. And write a review.

First, the format of the book. It is a short paper of 127 pages, plus 40 pages of glossary, appendices, references and index. I eventually found the name of the publisher, Sebtel Press, but for a while thought the book was self-produced. While the LaTeX output is fine and the (Matlab) graphs readable, pictures are not of the best quality and the display editing is minimal in that there are several huge white spaces between pages. Nothing major there, obviously, it simply makes the book look like course notes, but this is in no way detrimental to its potential appeal. (I will not comment on the numerous appearances of Bayes’ alleged portrait in the book.)

“… (on average) the adjusted value θMAP is more accurate than θMLE.” (p.82)

Bayes’ Rule has the interesting feature that, in the very first chapter, after spending a rather long time on Bayes’ formula, it introduces Bayes factors (p.15).  With the somewhat confusing choice of calling the prior probabilities of hypotheses marginal probabilities. Even though they are indeed marginal given the joint, marginal is usually reserved for the sample, as in marginal likelihood. Before returning to more (binary) applications of Bayes’ formula for the rest of the chapter. The second chapter is about probability theory, which means here introducing the three axioms of probability and discussing geometric interpretations of those axioms and Bayes’ rule. Chapter 3 moves to the case of discrete random variables with more than two values, i.e. contingency tables, on which the range of probability distributions is (re-)defined and produces a new entry to Bayes’ rule. And to the MAP. Given this pattern, it is not surprising that Chapter 4 does the same for continuous parameters. The parameter of a coin flip.  This allows for discussion of uniform and reference priors. Including maximum entropy priors à la Jaynes. And bootstrap samples presented as approximating the posterior distribution under the “fairest prior”. And even two pages on standard loss functions. This chapter is followed by a short chapter dedicated to estimating a normal mean, then another short one on exploring the notion of a continuous joint (Gaussian) density.

“To some people the word Bayesian is like a red rag to a bull.” (p.119)

Bayes’ Rule concludes with a chapter entitled Bayesian wars. A rather surprising choice, given the intended audience. Which is rather bound to confuse this audience… The first part is about probabilistic ways of representing information, leading to subjective probability. The discussion goes on for a few pages to justify the use of priors but I find completely unfair the argument that because Bayes’ rule is a mathematical theorem, it “has been proven to be true”. It is indeed a maths theorem, however that does not imply that any inference based on this theorem is correct!  (A surprising parallel is Kadane’s Principles of Uncertainty with its anti-objective final chapter.)

All in all, I remain puzzled after reading Bayes’ Rule. Puzzled by the intended audience, as contrary to other books I recently reviewed, the author does not shy away from mathematical notations and concepts, even though he proceeds quite gently through the basics of probability. Therefore, potential readers need some modicum of mathematical background that some students may miss (although it actually corresponds to what my kids would have learned in high school). It could thus constitute a soft entry to Bayesian concepts, before taking a formal course on Bayesian analysis. Hence doing no harm to the perception of the field.

## how to translate evidence into French?

Posted in Books, Statistics, University life with tags , , , , , on July 6, 2014 by xi'an

I got this email from Gauvain who writes a PhD in philosophy of sciences a few minutes ago:

L’auteur du texte que j’ai à traduire désigne les facteurs de Bayes comme une “Bayesian measure of evidence”, et les tests de p-value comme une “frequentist measure of evidence”. Je me demandais s’il existait une traduction française reconnue et établie pour cette expression de “measure of evidence”. J’ai rencontré parfois “mesure d’évidence” qui ressemble fort à un anglicisme, et parfois “estimateur de preuve”, mais qui me semble pouvoir mener à des confusions avec d’autres emploi du terme “estimateur”.

which (pardon my French!) wonders how to translate the term evidence into French. It would sound natural that the French évidence is the answer but this is not the case. Despite sharing the same Latin root (evidentia), since the English version comes from medieval French, the two words have different meanings: in English, it means a collection of facts coming to support an assumption or a theory, while in French it means something obvious, which truth is immediately perceived. Surprisingly, English kept the adjective evident with the same [obvious] meaning as the French évident. But the noun moved towards a much less definitive meaning, both in Law and in Science. I had never thought of the huge gap between the two meanings but must have been surprised at its use the first time I heard it in English. But does not think about it any longer, as when I reviewed Seber’s Evidence and Evolution.

One may wonder at the best possible translation of evidence into French. Even though marginal likelihood (vraisemblance marginale) is just fine for statistical purposes. I would suggest faisceau de présomptions or degré de soutien or yet intensité de soupçon as (lengthy) solutions. Soupçon could work as such, but has a fairly negative ring…