## that the median cannot be a sufficient statistic

Posted in Kids, Statistics, University life with tags , , , , , on November 14, 2014 by xi'an

When reading an entry on The Chemical Statistician that a sample median could often be a choice for a sufficient statistic, it attracted my attention as I had never thought a median could be sufficient. After thinking a wee bit more about it, and even posting a question on cross validated, but getting no immediate answer, I came to the conclusion that medians (and other quantiles) cannot be sufficient statistics for arbitrary (large enough) sample sizes (a condition that excludes the obvious cases of one & two observations where the sample median equals the sample mean).

In the case when the support of the distribution does not depend on the unknown parameter θ, we can invoke the Darmois-Pitman-Koopman theorem, namely that the density of the observations is necessarily of the exponential family form,

$\exp\{ \theta T(x) - \psi(\theta) \}h(x)$

to conclude that, if the natural sufficient statistic

$S=\sum_{i=1}^n T(x_i)$

is minimal sufficient, then the median is a function of S, which is impossible since modifying an extreme in the n>2 observations modifies S but not the median.

In the other case when the support does depend on the unknown parameter θ, we can consider the case when

$f(x|\theta) = h(x) \mathbb{I}_{A_\theta}(x) \tau(\theta)$

where the set indexed by θ is the support of f. In that case, the factorisation theorem implies that

$\prod_{i=1}^n \mathbb{I}_{A_\theta}(x_i)$

is a 0-1 function of the sample median. Adding a further observation y⁰ which does not modify the median then leads to a contradiction since it may be in or outside the support set.

Incidentally, if an aside, when looking for examples, I played with the distribution

$\dfrac{1}{2}\mathfrak{U}(0,\theta)+\dfrac{1}{2}\mathfrak{U}(\theta,1)$

which has θ as its theoretical median if not mean. In this example, not only the sample median is not sufficient (the only sufficient statistic is the order statistic and rightly so since the support is fixed and the distributions not in an exponential family), but the MLE is also different from the sample median. Here is an example with n=30 observations, the sienna bar being the sample median:

## marauders of the lost sciences

Posted in Books, Statistics, University life with tags , , , , , , on October 26, 2014 by xi'an

The editors of a new blog entitled Marauders of the Lost Sciences (Learn from the giants) sent me an email to signal the start of this blog with a short excerpt from a giant in maths or stats posted every day:

There is  a new blog I wanted to tell you
about which  excerpts one  interesting or
classic  paper  or  book  a day  from the
mathematical  sciences.  We plan on daily
posting across the  range of mathematical
fields and at any level, but about 20-30%
of the posts in queue are from statistics.

The goal is to entice people to read the great
works of old.

The first post today was from an old paper by
Fisher applying Group Theory to the design of
experiments.


Interesting concept, which will hopefully generate comments to put the quoted passage into context. Somewhat connected to my Reading Statistical Classics posts. Which incidentally if sadly will not take place this year since only two students registered. should take place in the end since more students registered! (I am unsure about the references behind the title of that blog, besides Spielberg’s Raiders of the Lost Ark and Norman’s Marauders of Gor… I just hope Statistics does not qualify as a lost science!)

## unicode in LaTeX

Posted in Books, Linux, Statistics, University life with tags , , , , , , on October 9, 2014 by xi'an

As I was hurriedly trying to cram several ‘Og posts into a conference paper (!), I looked around for a way of including Unicode characters straight away. And found this solution on StackExchange:

\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}

which just suited me fine!

## bridging the gap between machine learning and statistics

Posted in pictures, Statistics, Travel, University life with tags , , , , , , on May 10, 2014 by xi'an

Today in Warwick, I had a very nice discussion with Michael Betancourt on many statistical and computational issues but at one point in the conversation we came upon the trouble of bridging the gap between the machine learning and statistics communities. While a conference like AISTATS is certainly contributing to this, it does not reach the main bulk of the statistics community. Since, in Reykjavik, we had discussed the corresponding difficulty of people publishing a longer and “more” statistical paper in a “more” statistical journal, once the central idea was published in a machine learning conference proceeding like NIPS or AISTATS. we had this idea that creating a special fast-track in a mainstream statistics journal for a subset of those papers, using for instance a tailor-made committee in that original conference, or creating an annual survey of the top machine learning conference proceedings rewritten in a more” statistical way (and once again selected by an ad hoc committee) would help, at not too much of a cost for inducing machine learners to make the extra-effort of switching to another style. From there, we enlarged the suggestion to enlist a sufficient number of (diverse) bloggers in each major conference towards producing quick but sufficiently informative entries on their epiphany talks (if any), possibly supported by the conference organisers or the sponsoring societies. (I am always happy to welcome any guest blogger in conferences I attend!)

## Valen in Le Monde

Posted in Books, Statistics, University life with tags , , , , , , , , , , on November 21, 2013 by xi'an

Valen Johnson made the headline in Le Monde, last week. (More precisely, to the scientific blog Passeur de Sciences. Thanks, Julien, for the pointer!) With the alarming title of “(A study questions one major tool of the scientific approach). The reason for this French fame is Valen’s recent paper in PNAS, Revised standards for statistical evidence, where he puts forward his uniformly most powerful Bayesian tests (recently discussed on the ‘Og) to argue against the standard 0.05 significance level and in favour of “the 0.005 or 0.001 level of significance.”

“…many statisticians have noted that P values of 0.05 may correspond to Bayes factors that only favor the alternative hypothesis by odds of 3 or 4–1…” V. Johnson, PNAS

While I do plan to discuss the PNAS paper later (and possibly write a comment letter to PNAS with Andrew), I find interesting the way it made the headlines within days of its (early edition) publication: the argument suggesting to replace .05 with .001 to increase the proportion of reproducible studies is both simple and convincing for a scientific journalist. If only the issue with p-values and statistical testing could be that simple… For instance, the above quote from Valen is reproduced as “an [alternative] hypothesis that stands right below the significance level has in truth only 3 to 5 chances to 1 to be true”, the “truth” popping out of nowhere. (If you read French, the 300+ comments on the blog are also worth their weight in jellybeans…)

## machine learning as buzzword

Posted in Books, Kids with tags , , , , , on November 12, 2013 by xi'an

In one of his posts, my friend Larry mentioned that popular posts had to mention the Bayes/frequentist opposition in the title… I think mentioning machine learning is also a good buzzword to increase the traffic! I did spot this phenomenon last week when publishing my review of Kevin Murphy’s Machine Learning: the number of views and visitors jumped by at least a half, exceeding the (admittedly modest) 1000 bar on two consecutive days. Interestingly, the number of copies of Machine Learning (sold via my amazon associate link) did not follow this trend: so far, I only spotted a few copies sold, in similar amounts to the number of copies of Spatio-temporal Statistics I reviewed the week before. Or most books I review, positively or negatively! (However, I did spot a correlated increase in overall amazon associate orderings and brazenly attributed the command of a Lego robotic set to a “machine learner”! And as of yesterday Og‘s readers massively ordered 152 236 copies of the latest edition of Andrew’s Bayesian Data Analysis, Thanks!)

## please do not like my posts!!!

Posted in pictures with tags , , on September 7, 2012 by xi'an

This is a very minor inconvenience with WordPress (or just the blogging world) so do not read any further as it does not matter in the slightest! Indeed, for every post that I write, a few complete strangers tag it with a “like” mention, for which I get a notification. Given that those strangers have their own blog, mostly related to photography, I suspect this is an indirect way to induce more visits to their own site… Even though I do not see how anyone but me is aware of those “like”s…