Archive for median

double if not exponential

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 10, 2020 by xi'an

In one of my last quizzes for the year, as the course is about to finish, I asked whether mean or median was the MLE for a double exponential sample of odd size, without checking for the derivation of the result, as I was under the impression it was a straightforward result. Despite being outside exponential families. As my students found it impossible to solve within the allocated 5 minutes, I had a look, could not find an immediate argument (!), and used instead this nice American Statistician note by Robert Norton based on the derivative being the number of observations smaller than θ minus the number of observations larger than θ.  This leads to the result as well as the useful counter-example of a range of MLE solutions when the number of observations is even.

sampling the mean

Posted in Kids, R, Statistics with tags , , , , , on December 12, 2019 by xi'an

A challenge found on the board of the coffee room at CEREMADE, Université Paris Dauphine:

When sampling with replacement three numbers in {0,1,…,N}, what is the probability that their average is (at least) one of the three?

With a (code-golfed!) brute force solution of

mean(!apply((a<-matrix(sample(0:n,3e6,rep=T),3)),2,mean)-apply(a,2,median))

producing a graph pretty close to 3N/2(N+1)² (which coincides with a back-of-the-envelope computation):temp

subway commute distribution [nice graphics]

Posted in Books, pictures, Statistics with tags , , , , , , on July 25, 2019 by xi'an

An infographics entry in the New York Times about the distribution of a commute between two arbitrary subway stations in New York City, including a comparison of the distribution of a similar (?) commute by Tube in London. Showing that in most cases, the tail is thinner in London than in New York City. (Warning: the comparison may switch scales.)

Here is a bit of an outlier:

given that the two distributions hardly overlap and still share a similar median commute time!

a discovery that mean can be impacted by extreme values

Posted in University life with tags , , , , , , on August 6, 2016 by xi'an

A surprising editorial in Nature about the misleading uses of impact factors, since as means they are heavily impacted by extreme values. With the realisation that the mean is not the median for skewed distributions…

To be fair(er), Nature published a subsequent paper this week about publishing additional metrics like the two-year median.

likelihood-free inference in high-dimensional models

Posted in Books, R, Statistics, University life with tags , , , , , , , , , on September 1, 2015 by xi'an

“…for a general linear model (GLM), a single linear function is a sufficient statistic for each associated parameter…”

Water Tower, Michigan Avenue, Chicago, Oct. 31, 2012The recently arXived paper “Likelihood-free inference in high-dimensional models“, by Kousathanas et al. (July 2015), proposes an ABC resolution of the dimensionality curse [when the dimension of the parameter and of the corresponding summary statistics] by turning Gibbs-like and by using a component-by-component ABC-MCMC update that allows for low dimensional statistics. In the (rare) event there exists a conditional sufficient statistic for each component of the parameter vector, the approach is just as justified as when using a generic ABC-Gibbs method based on the whole data. Otherwise, that is, when using a non-sufficient estimator of the corresponding component (as, e.g., in a generalised [not general!] linear model), the approach is less coherent as there is no joint target associated with the Gibbs moves. One may therefore wonder at the convergence properties of the resulting algorithm. The only safe case [in dimension 2] is when one of the restricted conditionals does not depend on the other parameter. Note also that each Gibbs step a priori requires the simulation of a new pseudo-dataset, which may be a major imposition on computing time. And that setting the tolerance for each parameter is a delicate calibration issue because in principle the tolerance should depend on the other component values. Continue reading