Archive for complete statistics

on completeness

Posted in Books, Kids, Statistics with tags , , , , , , on November 19, 2020 by xi'an

Another X validated question that proved a bit of a challenge, enough for my returning to its resolution on consecutive days. The question was about the completeness of the natural sufficient statistic associated with a sample from the shifted exponential distribution

f(x;\theta) = \frac{1}{\theta^2}\exp\{-\theta^{-2}(x-\theta)\}\mathbb{I}_{x>\theta}

[weirdly called negative exponential in the question] meaning the (minimal) sufficient statistic is made of the first order statistic and of the sample sum (or average), or equivalently

T=(X_{(1)},\sum_{i=2}^n \{X_{(i)}-X_{(1)}\})

Finding the joint distribution of T is rather straightforward as the first component is a drifted Exponential again and the second a Gamma variate with n-2 degrees of freedom and the scale θ². (Devroye’s Bible can be invoked since the Gamma distribution follows from his section on Exponential spacings, p.211.) While the derivation of a function with constant expectation is straightforward for the alternate exponential distribution

f(x;\theta) = \frac{1}{\theta}\exp\{-\theta^{-1}(x-\theta)\}\mathbb{I}_{x>\theta}

since the ratio of the components of T has a fixed distribution, it proved harder for the current case as I was seeking a parameter free transform. When attempting to explain the difficulty on my office board, I realised I was seeking the wrong property since an expectation was enough. Removing the dependence on θ was simpler and led to

\mathbb E_\theta\left[\frac{X_{(1)}}{Y}-\frac{\Gamma(n-2)}{\Gamma(n-3/2)}Y^\frac{-1}{2}\right]=\frac{\Gamma(n-2)}{n\Gamma(n-1)}

but one version of a transform with fixed expectation. This also led me to wonder at the range of possible functions of θ one could use as scale and still retrieve incompleteness of T. Any power of θ should work but what about exp(θ²) or sin²(θ³), i.e. functions for which there exists no unbiased estimator..?

arbitrary non-constant function [nonsensical]

Posted in Statistics with tags , , , , , , , , , , , on November 6, 2020 by xi'an
When looking for properties of the negative exponential distribution, in connection with an X validated question, I came across this nonsensical paper, starting with a truncated and drifted exponential distribution being defined as a negative exponential distribution, including a nonsensical bound a in (1.1), followed by an equally nonsensical characterisation of the distribution, including a theorem with a useless function Φ… Unsurprisingly, the publisher (SCIRP) of the Open Journal of Statistics is part of Beall’s list of potential, possible, or probable] predatory publishers. (List that is now maintained by Scholarly Open Access.)

best unbiased estimator of θ² for a Poisson model

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on May 23, 2018 by xi'an

A mostly traditional question on X validated about the “best” [minimum variance] unbiased estimator of θ² from a Poisson P(θ) sample leads to the Rao-Blackwell solution

\mathbb{E}[X_1X_2|\underbrace{\sum_{i=1}^n X_i}_S=s] = -\frac{s}{n^2}+\frac{s^2}{n^2}=\frac{s(s-1)}{n^2}

and a similar estimator could be constructed for θ³, θ⁴, … With the interesting limitation that this procedure stops at the power equal to the number of observations (minus one?). But,  since the expectation of a power of the sufficient statistics S [with distribution P(nθ)] is a polynomial in θ, there is de facto no limitation. More interestingly, there is no unbiased estimator of negative powers of θ in this context, while this neat comparison on Wikipedia (borrowed from the great book of counter-examples by Romano and Siegel, 1986, selling for a mere $180 on amazon!) shows why looking for an unbiased estimator of exp(-2θ) is particularly foolish: the only solution is (-1) to the power S [for a single observation]. (There is however a first way to circumvent the difficulty if having access to an arbitrary number of generations from the Poisson, since the Forsythe – von Neuman algorithm allows for an unbiased estimation of exp(-F(x)). And, as a second way, as remarked by Juho Kokkala below, a sample of at least two Poisson observations leads to a more coherent best unbiased estimator.)

an improvable Rao–Blackwell improvement, inefficient maximum likelihood estimator, and unbiased generalized Bayes estimator

Posted in Books, Statistics, University life with tags , , , , , , , , on February 2, 2018 by xi'an

In my quest (!) for examples of location problems with no UMVU estimator, I came across a neat paper by Tal Galili [of R Bloggers fame!] and Isaac Meilijson presenting somewhat paradoxical properties of classical estimators in the case of a Uniform U((1-k)θ,(1+k)θ) distribution when 0<k<1 is known. For this model, the minimal sufficient statistic is the pair made of the smallest and of the largest observations, L and U. Since this pair is not complete, the Rao-Blackwell theorem does not produce a single and hence optimal estimator. The best linear unbiased combination [in terms of its variance] of L and U is derived in this paper, although this does not produce the uniformly minimum variance unbiased estimator, which does not exist in this case. (And I do not understand the remark that

“Any unbiased estimator that is a function of the minimal sufficient statistic is its own Rao–Blackwell improvement.”

as this hints at an infinite sequence of improvement.) While the MLE is inefficient in this setting, the Pitman [best equivariant] estimator is both Bayes [against the scale Haar measure] and unbiased. While experimentally dominating the above linear combination. The authors also argue that, since “generalized Bayes rules need not be admissible”, there is no guarantee that the Pitman estimator is admissible (under squared error loss). But given that this is a uni-dimensional scale estimation problem I doubt very much there is a Stein effect occurring in this case.

best unbiased estimators

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , , , on January 18, 2018 by xi'an

A question that came out on X validated today kept me busy for most of the day! It relates to an earlier question on the best unbiased nature of a maximum likelihood estimator, to which I pointed out the simple case of the Normal variance when the estimate is not unbiased (but improves the mean square error). Here, the question is whether or not the maximum likelihood estimator of a location parameter, when corrected from its bias, is the best unbiased estimator (in the sense of the minimal variance). The question is quite interesting in that it links to the mathematical statistics of the 1950’s, of Charles Stein, Erich Lehmann, Henry Scheffé, and Debabrata Basu. For instance, if there exists a complete sufficient statistic for the problem, then there exists a best unbiased estimator of the location parameter, by virtue of the Lehmann-Scheffé theorem (it is also a consequence of Basu’s theorem). And the existence is pretty limited in that outside the two exponential families with location parameter, there is no other distribution meeting this condition, I believe. However, even if there is no complete sufficient statistic, there may still exist best unbiased estimators, as shown by Bondesson. But Lehmann and Scheffé in their magisterial 1950 Sankhya paper exhibit a counter-example, namely the U(θ-1,θ-1) distribution:

since no non-constant function of θ allows for a best unbiased estimator.

Looking in particular at the location parameter of a Cauchy distribution, I realised that the Pitman best equivariant estimator is unbiased as well [for all location problems] and hence dominates the (equivariant) maximum likelihood estimator which is unbiased in this symmetric case. However, as detailed in a nice paper of Gabriela Freue on this problem, I further discovered that there is no uniformly minimal variance estimator and no uniformly minimal variance unbiased estimator! (And that the Pitman estimator enjoys a closed form expression, as opposed to the maximum likelihood estimator.) This sounds a bit paradoxical but simply means that there exists different unbiased estimators which variance functions are not ordered and hence not comparable. Between them and with the variance of the Pitman estimator.