Archive for Annals of Applied Statistics

running shoes

Posted in Books, Running, Statistics with tags , , , , , , , , , , on August 12, 2018 by xi'an

A few days ago, when back from my morning run, I spotted a NYT article on Nike shoes that are supposed to bring on average a 4% gain in speed. Meaning for instance a 3 to 4 minute gain in a half-marathon.

“Using public race reports and shoe records from Strava, a fitness app that calls itself the social network for athletes, The Times found that runners in Vaporflys ran 3 to 4 percent faster than similar runners wearing other shoes, and more than 1 percent faster than the next-fastest racing shoe.”

What is interesting in this NYT article is that the two journalists who wrote it have analysed their own data, taken from Strava. Using a statistical model or models (linear regression? non-linear regression? neural net?) to predict the impact of the shoe make, against “all” other factors contributing to the overall time or position or percentage gain or yet something else. In most analyses produced in the NYT article, the 4% gain is reproduced (with a 2% gain for female shoe switcher and a 7% gain for slow runners).

“Of course, these observations do not constitute a randomized control trial. Runners choose to wear Vaporflys; they are not randomly assigned them. One statistical approach that seeks to address this uses something called propensity scores, which attempt to control for the likelihood that someone wears the shoes in the first place. We tried this, too. Our estimates didn’t change.”

The statistical analysis (or analyses) seems rather thorough, from what is reported in the NYT article, with several attempts at controlling for confounders. Still, the data itself is observational, even if providing a lot of variables to run the analyses, as it only covers runners using Strava (from 5% in Tokyo to 25% in London!) and indicating the type of shoes they wear during the race. There is also the issue that the shoes are quite expensive, at $250 a pair, especially if the effect wears out after 100 miles (this was not tested in the study), as I would hesitate to use them unless the race conditions look optimal (and they never do!). There is certainly a new shoes effect on top of that, between the real impact of a better response and a placebo effect. As shown by a similar effect of many other shoe makes. Hence, a moderating impact on the NYT conclusion that these Nike Vaporflys (flies?!) are an “outlier”. But nonetheless a fairly elaborate and careful statistical study that could potentially make it to a top journal like Annals of Applied Statistics!

coauthorship and citation networks

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on February 21, 2017 by xi'an

cozauthorAs I discovered (!) the Annals of Applied Statistics in my mailbox just prior to taking the local train to Dauphine for the first time in 2017 (!), I started reading it on the way, but did not get any further than the first discussion paper by Pengsheng Ji and Jiashun Jin on coauthorship and citation networks for statisticians. I found the whole exercise intriguing, I must confess, with little to support a whole discussion on the topic. I may have read the paper too superficially as a métro pastime, but to me it sounded more like a post-hoc analysis than a statistical exercise, something like looking at the network or rather at the output of a software representing networks and making sense of clumps and sub-networks a posteriori. (In a way this reminded of my first SAS project at school, on the patterns of vacations in France. It was in 1983 on pinched cards. And we spent a while cutting & pasting in a literal sense the 80 column graphs produced by SAS on endless listings.)

It may be that part of the interest in the paper is self-centred. I do not think analysing a similar dataset in another field like deconstructionist philosophy or Korean raku would have attracted the same attention. Looking at the clusters and the names on the pictures is obviously making sense, if more at a curiosity than a scientific level, as I do not think this brings much in terms of ranking and evaluating research (despite what Bernard Silverman suggests in his preface) or understanding collaborations (beyond the fact that people in the same subfield or same active place like Duke tend to collaborate). Speaking of curiosity, I was quite surprised to spot my name in one network and even more to see that I was part of the “High-Dimensional Data Analysis” cluster, rather than of the “Bayes” cluster.  I cannot fathom how I ended up in that theme, as I cannot think of a single paper of mines pertaining to either high dimensions or data analysis [to force the trait just a wee bit!]. Maybe thanks to my joint paper with Peter Mueller. (I tried to check the data itself but cannot trace my own papers in the raw datafiles.)

I also wonder what is the point of looking at solely four major journals in the field, missing for instance most of computational statistics and biostatistics, not to mention machine learning or econometrics. This results in a somewhat narrow niche, if obviously recovering the main authors in the [corresponding] field. Some major players in computational stats still make it to the lists, like Gareth Roberts or Håvard Rue, but under the wrong categorisation of spatial statistics.

Climate change in Annals of Applied Statistics

Posted in Statistics, University life with tags , , on April 15, 2011 by xi'an

A long editorial by Michael Stein in arXiv attracted my attention to an equally long discussion paper in the March 2011 issue of the Annals of Applied Statistics about paleoclimatology and potential consequences about climate change. I will wait for my hardcopy to arrive by surface mail before going into the paper and discussions, but I was surprised by the high degree of caution and the warnings in this editorial, as if it was trying to buffer incoming criticisms from pro- and anti-global warming groups (that are bound to happen given that climate change is the number one topic on forums of all kinds). It is interesting given that previous issues of Annals of Applied Statistics have also had their share of potentially controversial material, from JFK assassination, to the lost tomb of Jesus, radiations from portals, and so on. (Which is a fair way of attracting readers as long as the statistical  quality is guaranteed, which is the case for AoAS!)