Archive for blogging
The WordPress.com stats helper monkeys prepared a 2014 annual report for the ‘Og…
.. and among the collected statistics for 2014, what I found most amazing are the three accesses from Greenland and the one access from Afghanistan!
Click here to see the complete report. (Assuming you have nothing better to do on Boxing day…)
When reading an entry on The Chemical Statistician that a sample median could often be a choice for a sufficient statistic, it attracted my attention as I had never thought a median could be sufficient. After thinking a wee bit more about it, and even posting a question on cross validated, but getting no immediate answer, I came to the conclusion that medians (and other quantiles) cannot be sufficient statistics for arbitrary (large enough) sample sizes (a condition that excludes the obvious cases of one & two observations where the sample median equals the sample mean).
In the case when the support of the distribution does not depend on the unknown parameter θ, we can invoke the Darmois-Pitman-Koopman theorem, namely that the density of the observations is necessarily of the exponential family form,
to conclude that, if the natural sufficient statistic
is minimal sufficient, then the median is a function of S, which is impossible since modifying an extreme in the n>2 observations modifies S but not the median.
In the other case when the support does depend on the unknown parameter θ, we can consider the case when
where the set indexed by θ is the support of f. In that case, the factorisation theorem implies that
is a 0-1 function of the sample median. Adding a further observation y⁰ which does not modify the median then leads to a contradiction since it may be in or outside the support set.
Incidentally, if an aside, when looking for examples, I played with the distribution
which has θ as its theoretical median if not mean. In this example, not only the sample median is not sufficient (the only sufficient statistic is the order statistic and rightly so since the support is fixed and the distributions not in an exponential family), but the MLE is also different from the sample median. Here is an example with n=30 observations, the sienna bar being the sample median:
The editors of a new blog entitled Marauders of the Lost Sciences (Learn from the giants) sent me an email to signal the start of this blog with a short excerpt from a giant in maths or stats posted every day:
There is a new blog I wanted to tell you about which excerpts one interesting or classic paper or book a day from the mathematical sciences. We plan on daily posting across the range of mathematical fields and at any level, but about 20-30% of the posts in queue are from statistics. The goal is to entice people to read the great works of old. The first post today was from an old paper by Fisher applying Group Theory to the design of experiments.
Interesting concept, which will hopefully generate comments to put the quoted passage into context. Somewhat connected to my Reading Statistical Classics posts. Which
incidentally if sadly will not take place this year since only two students registered. should take place in the end since more students registered! (I am unsure about the references behind the title of that blog, besides Spielberg’s Raiders of the Lost Ark and Norman’s Marauders of Gor… I just hope Statistics does not qualify as a lost science!)
As I was hurriedly trying to cram several ‘Og posts into a conference paper (!), I looked around for a way of including Unicode characters straight away. And found this solution on StackExchange:
which just suited me fine!
Today in Warwick, I had a very nice discussion with Michael Betancourt on many statistical and computational issues but at one point in the conversation we came upon the trouble of bridging the gap between the machine learning and statistics communities. While a conference like AISTATS is certainly contributing to this, it does not reach the main bulk of the statistics community. Since, in Reykjavik, we had discussed the corresponding difficulty of people publishing a longer and “more” statistical paper in a “more” statistical journal, once the central idea was published in a machine learning conference proceeding like NIPS or AISTATS. we had this idea that creating a special fast-track in a mainstream statistics journal for a subset of those papers, using for instance a tailor-made committee in that original conference, or creating an annual survey of the top machine learning conference proceedings rewritten in a more” statistical way (and once again selected by an ad hoc committee) would help, at not too much of a cost for inducing machine learners to make the extra-effort of switching to another style. From there, we enlarged the suggestion to enlist a sufficient number of (diverse) bloggers in each major conference towards producing quick but sufficiently informative entries on their epiphany talks (if any), possibly supported by the conference organisers or the sponsoring societies. (I am always happy to welcome any guest blogger in conferences I attend!)