that the median cannot be a sufficient statistic

Posted in Kids, Statistics, University life with tags , , , , , on November 14, 2014 by xi'an

When reading an entry on The Chemical Statistician that a sample median could often be a choice for a sufficient statistic, it attracted my attention as I had never thought a median could be sufficient. After thinking a wee bit more about it, and even posting a question on cross validated, but getting no immediate answer, I came to the conclusion that medians (and other quantiles) cannot be sufficient statistics for arbitrary (large enough) sample sizes (a condition that excludes the obvious cases of one & two observations where the sample median equals the sample mean).

In the case when the support of the distribution does not depend on the unknown parameter θ, we can invoke the Darmois-Pitman-Koopman theorem, namely that the density of the observations is necessarily of the exponential family form,

$\exp\{ \theta T(x) - \psi(\theta) \}h(x)$

to conclude that, if the natural sufficient statistic

$S=\sum_{i=1}^n T(x_i)$

is minimal sufficient, then the median is a function of S, which is impossible since modifying an extreme in the n>2 observations modifies S but not the median.

In the other case when the support does depend on the unknown parameter θ, we can consider the case when

$f(x|\theta) = h(x) \mathbb{I}_{A_\theta}(x) \tau(\theta)$

where the set indexed by θ is the support of f. In that case, the factorisation theorem implies that

$\prod_{i=1}^n \mathbb{I}_{A_\theta}(x_i)$

is a 0-1 function of the sample median. Adding a further observation y⁰ which does not modify the median then leads to a contradiction since it may be in or outside the support set.

Incidentally, if an aside, when looking for examples, I played with the distribution

$\dfrac{1}{2}\mathfrak{U}(0,\theta)+\dfrac{1}{2}\mathfrak{U}(\theta,1)$

which has θ as its theoretical median if not mean. In this example, not only the sample median is not sufficient (the only sufficient statistic is the order statistic and rightly so since the support is fixed and the distributions not in an exponential family), but the MLE is also different from the sample median. Here is an example with n=30 observations, the sienna bar being the sample median:

austerity in MCMC land (#2)

Posted in R, Statistics with tags , , , on April 29, 2013 by xi'an

After reading the arXiv paper by Korattikara, Chen and Welling, I wondered about the expression of the acceptance step of the Metropolis-Hastings algorithm as a mean of log-likelihoods over the sample. More specifically the long sleepless nights at the hospital led me to ponder the rather silly question of the impact of replacing mean by median. I thus tried running a Metropolis-Hastings algorithm with the substitute and it (of course!) let to a nonsensical answer, as shown by the above graph. The true posterior is the one for a normal model and the histogram indicates a lack of convergence of the Markov chain to this posterior even though it does converge to some posterior. Here is the R code for this tiny experiment:

#data generation
N=100
x=rnorm(N)

#HM steps
T=10^5
theta=rep(0,T)
curlike=dnorm(x,log=TRUE)
for (t in 2:T){

prop=theta[t-1]+.1*rnorm(1)
proplike=dnorm(x,mean=prop,log=TRUE)
u=runif(1)
bound=log(u)-dnorm(prop,sd=10,log=TRUE)+
dnorm(theta[t-1],sd=10,log=TRUE)
if (median(proplike)-median(curlike)>bound/N){
theta[t]=prop;curlike=proplike
} else { theta[t]=theta[t-1]}
}