Archive for the Statistics Category

running shoes

Posted in Books, Running, Statistics with tags , , , , , , , , , , on August 12, 2018 by xi'an

A few days ago, when back from my morning run, I spotted a NYT article on Nike shoes that are supposed to bring on average a 4% gain in speed. Meaning for instance a 3 to 4 minute gain in a half-marathon.

“Using public race reports and shoe records from Strava, a fitness app that calls itself the social network for athletes, The Times found that runners in Vaporflys ran 3 to 4 percent faster than similar runners wearing other shoes, and more than 1 percent faster than the next-fastest racing shoe.”

What is interesting in this NYT article is that the two journalists who wrote it have analysed their own data, taken from Strava. Using a statistical model or models (linear regression? non-linear regression? neural net?) to predict the impact of the shoe make, against “all” other factors contributing to the overall time or position or percentage gain or yet something else. In most analyses produced in the NYT article, the 4% gain is reproduced (with a 2% gain for female shoe switcher and a 7% gain for slow runners).

“Of course, these observations do not constitute a randomized control trial. Runners choose to wear Vaporflys; they are not randomly assigned them. One statistical approach that seeks to address this uses something called propensity scores, which attempt to control for the likelihood that someone wears the shoes in the first place. We tried this, too. Our estimates didn’t change.”

The statistical analysis (or analyses) seems rather thorough, from what is reported in the NYT article, with several attempts at controlling for confounders. Still, the data itself is observational, even if providing a lot of variables to run the analyses, as it only covers runners using Strava (from 5% in Tokyo to 25% in London!) and indicating the type of shoes they wear during the race. There is also the issue that the shoes are quite expensive, at $250 a pair, especially if the effect wears out after 100 miles (this was not tested in the study), as I would hesitate to use them unless the race conditions look optimal (and they never do!). There is certainly a new shoes effect on top of that, between the real impact of a better response and a placebo effect. As shown by a similar effect of many other shoe makes. Hence, a moderating impact on the NYT conclusion that these Nike Vaporflys (flies?!) are an “outlier”. But nonetheless a fairly elaborate and careful statistical study that could potentially make it to a top journal like Annals of Applied Statistics!

JSM 2018 [#4½]

Posted in Statistics, University life with tags , , , , , , , , on August 10, 2018 by xi'an

As I wrote my previous blog entry on JSM2018 before the sessions, I did not have the chance to comment on our mixture session, which I found most interesting!, with new entries on the topic and a great discussion by Bettina Grün. Including the important call for linking weights with the other parameters, as both groups being independent does not make sense when the number of components is uncertain. (Incidentally our paper with Kaniav kamary and Kate Lee does create a dependence.) The talk by Deborah Kunkel was about anchored mixture estimation, a joint work with Mario Peruggia, another arXival that I had missed.

The notion of anchoring found in this paper is to allocate specific observations to specific components. These observations are thus anchored to these components. Among other things, this modification of the sampling model implies a removal of the unidentifiability problem. Hence formally of the label-switching or lack thereof issue. (Although, as Peter Green repeatedly mentioned, visualising the parameter space as a point process eliminates the issue.) This idea is somewhat connected with the constraint Jean Diebolt and I imposed in our 1990 mixture paper, namely that no component would have less than two observations allocated to it, but imposing which ones are which of course reduces drastically the complexity of the model. Another (related) aspect of anchoring is that the observations that are anchored to the components act as parts of the prior model, modifying the initial priors (which can then become improper as in our 1990 paper). The difficulty of the anchoring approach is to find observations to anchor in an unsupervised setting. The paper proceeds by optimising the allocations, which somewhat turns the prior into a data-dependent prior since all observations are used to set the anchors and then used again for the standard Bayesian processing. In that respect, I would rather follow the sequential procedure developed by Nicolas Chopin and Florian Pelgrin, where the number of components grows by steps with the number of observations.

 

Le Monde puzzle [#1063]

Posted in Books, Kids, R with tags , , , , , , on August 9, 2018 by xi'an

lemondapariA simple (summertime?!) arithmetic Le Monde mathematical puzzle

  1. A “powerful integer” is such that all its prime divisors are at least with multiplicity 2. Are there two powerful integers in a row, i.e. such that both n and n+1 are powerful?
  2.  Are there odd integers n such that n² – 1 is a powerful integer ?

The first question can be solved by brute force.  Here is a R code that leads to the solution:

isperfz <- function(n){ 
  divz=primeFactors(n) 
  facz=unique(divz) 
  ordz=rep(0,length(facz)) 
  for (i in 1:length(facz)) 
    ordz[i]=sum(divz==facz[i]) 
  return(min(ordz)>1)}

lesperf=NULL
for (t in 4:1e5)
if (isperfz(t)) lesperf=c(lesperf,t)
twinz=lesperf[diff(lesperf)==1]

with solutions 8, 288, 675, 9800, 12167.

The second puzzle means rerunning the code only on integers n²-1…

[1] 8
[1] 288
[1] 675
[1] 9800
[1] 235224
[1] 332928
[1] 1825200
[1] 11309768

except that I cannot exceed n²=10⁸. (The Le Monde puzzles will now stop for a month, just like about everything in France!, and then a new challenge will take place. Stay tuned.)

head position at Warwick stats

Posted in Statistics with tags , , , , , , , , on August 7, 2018 by xi'an

The Department of Statistics at Warwick seeks a new head to continue to develop and advance the quality of its education and research. The successful candidate will be appointed as a professor on an indefinite basis and will have a strong research and leadership profile. The appointment as Head of Department will be for three years in the first instance, with an option to extend. The next Head will work with this large and diverse community of academics and students, and support collaboration with the wider University. They will represent the Department to public and private audiences, nationally and internationally, and develop networks to promote the work of the Department. The deadline for applicants is 28 September 2018.

 

ICM 2018

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on August 4, 2018 by xi'an

While I am not following the International Congress of Mathematicians which just started in Rio, and even less attending, I noticed an entry on their webpage on my friend and colleague Maria Esteban which I would have liked to repost verbatim but cannot figure how. (ICM 2018 also features a plenary lecture by Michael Jordan on gradient based optimisation [which was also Michael’s topic at ISBA 2018] and another one by Sanjeev Arora on the maths deep learning, two talks broadly related with statistics, which is presumably a première at this highly selective maths conference!)

JSM 2018 [#4]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 3, 2018 by xi'an

As last ½ day of sessions at JSM2018 in an almost deserted conference centre, with a first session set together by Mario Peruggia and a second on Advances in Bayesian Nonparametric Modeling and Computation for Complex Data. Here are the slides of my talk this morning in the Bayesian mixture estimation session.

which I updated last night (Slideshare most absurdly does not let you update versions!)

Since I missed the COPSS Award ceremony for a barbecue with friends on Locarno Beach, I only discovered this morning that the winner this year is Richard Samworth, from Cambridge University, who eminently deserves this recognition, if only because of his contributions to journal editing, as I can attest from my years with JRSS B. Congrats to him as well as to Bin Yu and Susan Murphy for their E.L. Scott and R.A. Fisher Awards!  I also found out from an email to JSM participants that the next edition is in Denver, Colorado, which I visited only once in 1993 on a trip to Fort Collins visiting Kerrie Mengersen and Richard Tweedie. Given the proximity to the Rockies, I am thinking of submitting an invited session on ABC issues, which were not particularly well covered by this edition of JSM. (Feel free to contact me if you are interested in joining the session.)

JSM 2018 [#3]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on August 2, 2018 by xi'an

Third day at JSM2018 and the audience is already much smaller than the previous days! Although it is hard to tell with a humongous conference centre spread between two buildings. And not getting hooked by the tantalising view of the bay, with waterplanes taking off every few minutes…


Still, there were (too) few participants in the two computational statistics (MCMC) sessions I attended in the morning, the first one being organised by James Flegal on different assessments of MCMC convergence. (Although this small audience made the session quite homely!) In his own talk, James developed an interesting version of multivariate ESS that he related with a stopping rule for minimal precision. Vivek Roy also spoke about a multiple importance sampling construction I missed when it came upon on arXiv last May.

In the second session, Mylène Bédard exposed the construction of and improvement brought by local scaling in MALA, with 20% gain from using non-local tuning. Making me idle muse over whether block sizes in block-Gibbs sampling could also be locally optimised… Then Aaron Smith discussed how HMC should be scaled for optimal performances, under rather idealised conditions and very high dimensions. Mentioning a running time of d, the dimension, to the power ¼. But not addressing the practical question of calibrating scale versus number of steps in the discretised version. (At which time my hands were [sort of] frozen solid thanks to the absurd air conditioning in the conference centre and I had to get out!)