## more concentration, everywhere

Posted in R, Statistics with tags , , , , , , , , , , on January 25, 2019 by xi'an

Although it may sound like an excessive notion of optimality, one can hope at obtaining an estimator δ of a unidimensional parameter θ that is always closer to θ that any other parameter. In distribution if not almost surely, meaning the cdf of (δ-θ) is steeper than for other estimators enjoying the same cdf at zero (for instance ½ to make them all median-unbiased). When I saw this question on X validated, I thought of the Cauchy location example, where there is no uniformly optimal estimator, albeit a large collection of unbiased ones. But a simulation experiment shows that the MLE does better than the competition. At least than three (above) four of them (since I tried the Pitman estimator via Christian Henning’s smoothmest R package). The differences to the MLE empirical cd make it clearer below (with tomato for a score correction, gold for the Pitman estimator, sienna for the 38% trimmed mean, and blue for the median):I wonder at a general theory along these lines. There is a vague similarity with Pitman nearness or closeness but without the paradoxes induced by this criterion. More in the spirit of stochastic dominance, which may be achievable for location invariant and mean unbiased estimators…

## Pitman closeness renewal?

Posted in Statistics, University life with tags , , , , on July 26, 2012 by xi'an

As noticed there a few months ago, the Pitman closeness criterion for comparing estimators (through the probability

Pθ(|δ-θ|<|δ’-θ|)

which should be larger than .5 for the first estimator to be deemed “better” or “Pitman closer”) has been “resuscitated” by Canadian researchers. In 1993, I wrote a JASA (discussion) paper along with Gene Hwang and Bill Strawderman pointing out the many inconsistencies of this criterion as a decision tool.  It was entitled “Is Pitman Closeness a Reasonable Criterion?” (The answer was in the question, right?!)

In an arXiv posting today, Jozani, Balakrishnan, and Davies propose new characterisations for comparing (in this sense) symmetrically distributed estimators. There is nothing wrong with this mathematical exercise, obviously. However, the approach still seems to suffer from the same decisional inconsistencies as in the past:

1. the results in the paper (see, e.g., Lemma 1 and 2) only apply to independent estimators, which is rather unrealistic (to the point of having the authors applying it to dependent estimators, the sample median X[n/2] versus a fixed index observation, e.g. X3, and again at the end of the paper in the comparison of several order statistics). Having independent estimators to compare is a rather rare situation as one tries to make the most of a given sample;
2. the setup is highly dependent on considering a single (one-dimensional) location parameter, the results do not apply to more general settings (except location-scale cases with scale parameters known to some extent, see Lemma 5) ;
3. some results (see Remark 4) allow to find a whole range of estimators dominating a given (again independent) estimator δ’, but they do not give a ranking of those estimators, except in the weak sense of having the above probability maximal in one of the estimators δ (Lemma 9). This is due to the independence constraint on the comparison. There is therefore no possibility (in this setting) of obtaining an estimator that is the “Pitman closest estimator of θ“, as claimed by the authors in the final section of their paper.

Once again, I have nothing against these derivations, which are mostly correct, but I simply argue here that they cannot constitute a competitor to standard decision theory.

## on Pitman closeness

Posted in Statistics, University life with tags , , , on November 15, 2011 by xi'an

I came by happenstance upon this talk, “Some Pitman Closeness Properties Pertinent to Symmetric Populations”, given by Mohammad Jozania, at the University of Manitoba next week, and it rescinded my former (if negative) interest in Pitman nearness (or closeness). This criterion, which originated in a 1937 paper of E.J.G. Pitman, compares two estimators in the light of the probability of one being closer (to the “truth”) than the other,

$\text{Pr}_\theta(|\hat\theta_1(X)-\theta|<|\hat\theta_2(X)-\theta|)$

and there was a brief interest in the method at the end of the 1980’s, culminating with Keating and Mason’s book on the topic.

In a 1993 JASA paper I wrote with Gene Hwang and Bill Strawderman, entitled “Is Pitman Closeness a Reasonable Criterion?“, we demonstrated that, in many respects, this criterion was not appropriate for comparing estimators. For instance, the comparison was not transitive, two estimators with the same marginal distribution could sometimes be ranked, a Bayes estimator could not be properly derived, some counter-intuitive orderings could be exhibited, &tc… This was an exciting (and fun) paper to  write as it was only made of (counter)examples. (Hence our answer to the above question was  definitive no.) Judging from the abstract to the talk,

In this talk, we focus on Pitman closeness probabilities when the estimators are symmetrically distributed about the unknown parameter θ. We first consider two symmetric estimators θ¹ and θ² and obtain necessary and sufficient conditions for θ¹ to be Pitman closer to the common median θ than θ². We then establish some properties in the context of estimation under Pitman closeness criterion. We define a Pitman closeness probability which measures the frequency with which an individual order statistic is Pitman closer to θ than some symmetric estimator. We show that, for symmetric populations, the sample median is Pitman closer to the population median than any other symmetrically distributed estimator of θ. Finally, we discuss the use of Pitman closeness probabilities in the determination of an optimal ranked set sampling scheme (denoted by RSS) for the estimation of the population median when the underlying distribution is symmetric. We show that the best RSS scheme from symmetric populations in the sense of Pitman closeness is the median and randomized median RSS for the cases of odd and even sample sizes, respectively.

it sounds like the authors have relaunched research in this area, hence that our 1993 definitive conclusion against the use of the criterion was not definitive for everyone…  (I could not find a trace of the corresponding paper through google, but I would be interested in reading the recent research on the topic! Even though the result about the “optimality” of the sample median reminds me of earlier results, with the related drawback that this optimality is incompatible with the sufficiency principle.)

## Statistical Inference

Posted in Books, Statistics, University life with tags , , , , , , , , , on November 16, 2010 by xi'an

Following the publication of several papers on the topic of integrated evidence (about competing models), Murray Aitkin has now published a book entitled Statistical Inference and I have now finished reading it. While I appreciate the effort made by Murray Aitkin to place his theory within a coherent Bayesian framework, I remain unconvinced of the said coherence, for reasons exposed below.

The main chapters of the book are Chapter 2 about the “Integrated Bayes/likelihood approach” and Chapter 4 about the “Unified analysis of finite populations”, Chapter 7 also containing a new proposal about “Goodness of fit and model diagnostics”. Chapter 1 is a nice introduction to frequentist, likelihood and Bayesian approaches to inference and the four remaining chapters are applications of Murray Aitkin‘s principles to various models.  The style of the book is quite pleasant although slightly discursive in what I (a Frenchman!) would qualify as an English style in that it is often relying on intuition to develop concepts. I also think that the argument of being close to the frequentist decision (aka the p-value) too often serves as a justification in the book (see, e.g., page 43 “the p-value has a direct interpretation as a posterior probability”). As an aside, Murray Aitkin is a strong believer in plotting cdfs rather than densities to provide information about a distribution and hence cdf plots abound throughout the book.  (I counted 82 pictures of them.) While the book contains a helpful array of examples and datasets, the captions of the (many) figures are too terse for my taste: The figures are certainly not self-contained and even with the help of the main text they do not always make complete sense. Continue reading