**W**hen Jean-Louis Foulley pointed out to me this page in the September issue of Amstat News, about nominating a favourite teacher, I told him it had to be an homonym statistician! Or a practical joke! After enquiry, it dawned on me that this completely underserved inclusion came from a former student in my undergraduate Estimation course, who was very enthusiastic about statistics and my insistence on modelling rather than mathematical validation. He may have been the only one in the class, as my students always complain about not seeing the point in slides with no mathematical result. Like earlier this week when after 90mn on introducing the bootstrap method, a student asked me what was new compared with the Glivenko-Cantelli theorem I had presented the week before… (Thanks anyway to David for his vote and his kind words!)

## Archive for ASA

## what is your favorite teacher?

Posted in Kids, Statistics, University life with tags American Statistical Association, Amstat News, ASA, bootstrap, estimation class, Glivenko-Cantelli Theorem, mathematics and statistics, teaching, Université Paris Dauphine on October 14, 2017 by xi'an## errors, blunders, and lies [book review]

Posted in Books, Kids, Statistics, University life with tags and lies, ASA, blunders, book review, CHANCE, CRC Press, errors, introductory textbooks, Pierre Simon Laplace, The Lady Tasting Tea on July 9, 2017 by xi'an **T**his new book by David Salsburg is the first one in the ASA-CRC Series on Statistical Reasoning in Science and Society. Which explains why I heard about it both from CRC Press [as a suggested material for a review in CHANCE] and from the ASA [as mass emailing]. The name of the author did not ring a bell until I saw the line about his earlier The Lady Tasting Tea book, a best-seller in the category of “soft [meaning math- and formula-free] introduction to Statistics through picturesque characters”. Which I did not read either [but Bob Carpenter did].

The current book is of the same flavour, albeit with some maths formulas [each preceded by a lengthy apology for using maths and symbols]. The topic is the one advertised in the title, covering statistical errors and the way to take advantage of them, model mis-specification and robustness, and the detection of biases and data massaging. I read the short book in one quick go, waiting for the results of the French Legislative elections, and found no particular appeal in the litany of examples, historical entries, pitfalls, and models I feel I have already read so many times in the story-telling approach to statistics. (Naked Statistics comes to mind.)

It is not that there anything terrible with the book, which is partly based on the author’s own experience in a pharmaceutical company, but it does not seem to bring out any novelty for engaging into the study of statistics or for handling data in a more rational fashion. And I do not see which portion of the readership is targeted by the book, which is too allusive for academics and too academic for a general audience, who is not necessarily fascinated by the finer details of the history (and stories) of the field. As in The Lady Tasting Tea, the chapters constitute a collection of vignettes, rather than a coherent discourse leading to a theory or defending an overall argument. Some chapters are rather poor, like the initial chapter explaining the distinction between lies, blunders, and errors through the story of the measure of the distance from Earth to Sun by observing the transit of Venus, not that the story is uninteresting, far from it!, but I find it lacking in connecting with statistics [e.g., the meaning of a “correct” observation is never explained]. Or the chapter on the Princeton robustness study, where little is explained about the nature of the wrong distributions, which end up as specific contaminations impacting mostly the variance. And some examples are hardly convincing, like those on text analysis (Chapters 13, 14, 15), where there is little backup for using Benford’s law on such short datasets. Big data is understood only under the focus of large p, small n, which is small data in my opinion! (Not to mention a minor crime de *lèse-majesté* in calling Pierre-Simon Laplace Simon-Pierre Laplace! I would also have left the *Marquis de* aside as this title came to him during the Bourbon Restauration, despite him having served Napoléon for his entire reign.) And, as mentioned above, the book contains apologetic mathematics, which never cease to annoy me since apologies are not needed. While the maths formulas are needed.

## not an ASA’s statement on p-values

Posted in Books, Kids, Statistics, University life with tags ASA, p-values, statistical significance, testing of hypotheses, Vladimir Vovk on March 18, 2016 by xi'an

**T**his may be a coincidence, but a few days after the ASA statement got published, Yuri Gurevich and Vladimir Vovk arXived a note on the Fundamentals of p-values. Which actually does not contribute to the debate. The paper is written in a Q&A manner. And defines a sort of peculiar logic related with [some] p-values. A second and more general paper is in the making, which may shed more light on the potential appeal of this formalism…

## JSM 2015 [day #4]

Posted in pictures, Running, Statistics, Travel, University life with tags ASA, bag of little bootstraps, consistency, harmonic mean estimator, JSM 2015, Langevin diffusion, Langevin MCMC algorithm, latent variable, marginal likelihood, MCMC, Monte Carlo Statistical Methods, MrBayes, philogenic trees, R.A. Fisher, Seattle, soectral clustering, spectral gap, STAN, University of Warwick on August 13, 2015 by xi'an**M**y first session today was Markov Chain Monte Carlo for Contemporary Statistical Applications with a heap of interesting directions in MCMC research! Now, without any possible bias (!), I would definitely nominate Murray Pollock (incidentally from Warwick) as the winner for best slides, funniest presentation, and most enjoyable accent! More seriously, the scalable Langevin algorithm he developed with Paul Fearnhead, Adam Johansen, and Gareth Roberts, is quite impressive in avoiding computing costly likelihoods. With of course caveats on which targets it applies to. Murali Haran showed a new proposal to handle high dimension random effect models by a projection trick that reduces the dimension. Natesh Pillai introduced us (or at least me!) to a spectral clustering that allowed for an automated partition of the target space, itself the starting point to his parallel MCMC algorithm. Quite exciting, even though I do not perceive partitions as an ideal solution to this problem. The final talk in the session was Galin Jones’ presentation of consistency results and conditions for multivariate quantities which is a surprisingly unexplored domain. MCMC is still alive and running!

The second MCMC session of the morning, Monte Carlo Methods Facing New Challenges in Statistics and Science, was equally diverse, with Lynn Kuo’s talk on the HAWK approach, where we discovered that harmonic mean estimators are still in use, e.g., in MrBayes software employed in phylogenetic inference. The proposal to replace this awful estimator that should never be seen again (!) was rather closely related to an earlier solution of us for marginal likelihood approximation, based there on a partition of the whole space rather than an HPD region in our case… Then, Michael Betancourt brilliantly acted as a proxy for Andrew to present the STAN language, with a flashy trailer he most recently designed. Featuring Andrew as the sole actor. And with great arguments for using it, including the potential to run expectation propagation (as a way of life)*. In fine*, Faming Liang proposed a bootstrap subsampling version of the Metropolis-Hastings algorithm, where the likelihood acknowledging the resulting bias in the limiting distribution.

My first afternoon session was another entry on Statistical Phylogenetics, somewhat continued from yesterday’s session. Making me realised I had not seen a single talk on ABC for the entire meeting! The issues discussed in the session were linked with aligning sequences and comparing many trees. Again in settings where likelihoods can be computed more or less explicitly. Without any expertise in the matter, I wondered at a construction that would turn all trees, like into realizations of a continuous model. For instance by growing one branch at a time while removing the MRCA root… And maybe using a particle like method to grow trees. As an aside, Vladimir Minin told me yesterday night about genetic mutations that could switch on and off phenotypes repeatedly across generations… For instance the ability to glow in the dark for species of deep sea fish.

When stating that I did not see a single talk about ABC, I omitted Steve Fienberg’s Fisher Lecture R.A. Fisher and the Statistical ABCs, keeping the *morceau de choix* for the end! Even though of course Steve did not mention the algorithm! A was for *asymptotics*, or ancilarity, B for *Bayesian* (or biducial??), C for *causation* (or cuffiency???)… Among other germs, I appreciated that Steve mentioned my great-grand father Darmois in connection with exponential families! And the connection with Jon Wellner’s LeCam Lecture from a few days ago. And reminding us that Savage was a Fisher lecturer himself. And that Fisher introduced fiducial distributions quite early. And for defending the Bayesian perspective. Steve also set some challenges like asymptotics for networks, Bayesian model assessment (I liked the notion of stepping out of the model), and randomization when experimenting with networks. And for big data issues. And for personalized medicine, building on his cancer treatment. No trace of the ABC algorithm, obviously, but a wonderful Fisher’s lecture, also most obviously!! Bravo, Steve, keep thriving!!!

## Ebola virus [and Mr. Bayes]

Posted in Statistics, Travel, University life with tags ASA, Ebola virus, JSM 2014, Malaysian Airlines, philogenic trees, Statistics without Borders, The New York Times, Ukraine on August 12, 2014 by xi'an**J**ust like after the Malaysian Airlines flight 370 disappearance, the current Ebola virus outbreak makes me feel we are sorely missing an emergency statistical force to react on urgent issues… It would indeed be quite valuable to have a team of statisticians at the ready to quantify risks and posterior probabilities and avoid media approximations. The situations calling for this reactive force abound. A few days ago I was reading about the unknown number of missing pro-West activists in Eastern Ukraine. Maybe statistical societies could join forces to set such an emergency team?! Whose goals are somewhat different from the great Statistics without Borders…

**A**s a side remark, the above philogeny is taken from Dudas and Rambaut’s recent paper in PLOS reassessing the family tree of the current Ebola virus(es) acting in Guinea. The tree is found using MrBayes, which delivers a posterior probability of 1 to this filiation! And concluding “that the rooting of this clade using the very divergent other ebolavirus species is very problematic.”