Archive for median

meandering

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 12, 2021 by xi'an

A bit of a misunderstanding from Randall Munroe and then some: the function F returns a triplet, hence G should return a triplet as well. Even if the limit does return three identical values. And he should have also included the (infamous) harmonic mean! And the subtext (behind the picture) mentions random forest statistics, using every mean one can think of and dropping those that are doing worse, while here all solutions return the same value, hence do not directly discriminate between the averages (and there is no objective function to create the nodes in the trees, &tc.).

Here is a test R code including the harmonic mean:

xkcd=function(x)c(mean(x),exp(mean(log(x))),median(x),1/mean(1/x))
xxxkcd=function(x,N=10)ifelse(rep(N==1,4),xkcd(x),xxxkcd(xkcd(x),N-1))
xxxkcd(rexp(11))
[1] 1.018197 1.018197 1.018197 1.018197

double if not exponential

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 10, 2020 by xi'an

In one of my last quizzes for the year, as the course is about to finish, I asked whether mean or median was the MLE for a double exponential sample of odd size, without checking for the derivation of the result, as I was under the impression it was a straightforward result. Despite being outside exponential families. As my students found it impossible to solve within the allocated 5 minutes, I had a look, could not find an immediate argument (!), and used instead this nice American Statistician note by Robert Norton based on the derivative being the number of observations smaller than θ minus the number of observations larger than θ.  This leads to the result as well as the useful counter-example of a range of MLE solutions when the number of observations is even.

sampling the mean

Posted in Kids, R, Statistics with tags , , , , , on December 12, 2019 by xi'an

A challenge found on the board of the coffee room at CEREMADE, Université Paris Dauphine:

When sampling with replacement three numbers in {0,1,…,N}, what is the probability that their average is (at least) one of the three?

With a (code-golfed!) brute force solution of

mean(!apply((a<-matrix(sample(0:n,3e6,rep=T),3)),2,mean)-apply(a,2,median))

producing a graph pretty close to 3N/2(N+1)² (which coincides with a back-of-the-envelope computation):temp

subway commute distribution [nice graphics]

Posted in Books, pictures, Statistics with tags , , , , , , on July 25, 2019 by xi'an

An infographics entry in the New York Times about the distribution of a commute between two arbitrary subway stations in New York City, including a comparison of the distribution of a similar (?) commute by Tube in London. Showing that in most cases, the tail is thinner in London than in New York City. (Warning: the comparison may switch scales.)

Here is a bit of an outlier:

given that the two distributions hardly overlap and still share a similar median commute time!

a discovery that mean can be impacted by extreme values

Posted in University life with tags , , , , , , on August 6, 2016 by xi'an

A surprising editorial in Nature about the misleading uses of impact factors, since as means they are heavily impacted by extreme values. With the realisation that the mean is not the median for skewed distributions…

To be fair(er), Nature published a subsequent paper this week about publishing additional metrics like the two-year median.