## Darmois, Koopman, and Pitman

Posted in Books, Statistics with tags , , , , , , , , on November 15, 2017 by xi'an When [X’ed] seeking a simple proof of the Pitman-Koopman-Darmois lemma [that exponential families are the only types of distributions with constant support allowing for a fixed dimension sufficient statistic], I came across a 1962 Stanford technical report by Don Fraser containing a short proof of the result. Proof that I do not fully understand as it relies on the notion that the likelihood function itself is a minimal sufficient statistic.

## done! [#1]

Posted in Kids, pictures, University life with tags , , , , , , on January 16, 2016 by xi'an After spending a few hours grading my 127 exams for most nights of this week, I am finally done with it! One of the exam questions was the simulation of XY when (X,Y) is a bivariate normal vector with correlation ρ, following the trick described in a X validated question asked a few months ago, namely that

XY≡R{cos(πU)+ρ}

but no one managed to establish this representation. And, as usual, some students got confused between parameters θ and observations x when writing a posterior density, since the density of the prior was defined in the exam with the dummy x, thereby recovering the prior as the posterior. Nothing terrible and nothing exceptional with this cohort of undergraduates. And now I still have to go through my second pile of exams for the graduate course I taught on Bayesian computational tools…

## minimaxity of a Bayes estimator

Posted in Books, Kids, Statistics, University life with tags , , , , , on February 2, 2015 by xi'an Today, while in Warwick, I spotted on Cross Validated a question involving “minimax” in the title and hence could not help but look at it! The way I first understood the question (and immediately replied to it) was to check whether or not the standard Normal average—reduced to the single Normal observation by sufficiency considerations—is a minimax estimator of the normal mean under an interval zero-one loss defined by $\mathcal{L}(\mu,\hat{\mu})=\mathbb{I}_{|\mu-\hat\mu|>L}=\begin{cases}1 &\text{if }|\mu-\hat\mu|>L\\ 0&\text{if }|\mu-\hat{\mu}|\le L\\ \end{cases}$

where L is a positive tolerance bound. I had not seen this problem before, even though it sounds quite standard. In this setting, the identity estimator, i.e., the normal observation x, is indeed minimax as (a) it is a generalised Bayes estimator—Bayes estimators under this loss are given by the centre of an equal posterior interval—for this loss function under the constant prior and (b) it can be shown to be a limit of proper Bayes estimators and its Bayes risk is also the limit of the corresponding Bayes risks. (This is a most traditional way of establishing minimaxity for a generalised Bayes estimator.) However, this was not the question asked on the forum, as the book by Zacks it referred to stated that the standard Normal average maximised the minimal coverage, which amounts to the maximal risk under the above loss. With the strange inversion of parameter and estimator in the minimax risk: $\sup_\mu\inf_{\hat\mu} R(\mu,\hat{\mu})\text{ instead of } \sup_\mu\inf_{\hat\mu} R(\mu,\hat{\mu})$

which makes the first bound equal to 0 by equating estimator and mean μ. Note however that I cannot access the whole book and hence may miss some restriction or other subtlety that would explain for this unusual definition. (As an aside, note that Cross Validated has a protection against serial upvoting, So voting up or down at once a large chunk of my answers on that site does not impact my “reputation”!)

Posted in Kids, pictures, Statistics, University life with tags , , , , , , on January 19, 2015 by xi'an Now my grading is over, I can reflect on the unexpected difficulties in the mathematical statistics exam. I knew that the first question in the multiple choice exercise, borrowed from Cross Validation, was going to  be quasi-impossible and indeed only one student out of 118 managed to find the right solution. More surprisingly, most students did not manage to solve the (absence of) MLE when observing that n unobserved exponential Exp(λ) were larger than a fixed bound δ. I was also amazed that they did poorly on a N(0,σ²) setup, failing to see that $\mathbb{E}[\mathbb{I}(X_1\le -1)] = \Phi(-1/\sigma)$

and determine an unbiased estimator that can be improved by Rao-Blackwellisation. No student reached the conditioning part. And a rather frequent mistake more understandable due to the limited exposure they had to Bayesian statistics: many confused parameter λ with observation x in the prior, writing $\pi(\lambda|x) \propto \lambda \exp\{-\lambda x\} \times x^{a-1} \exp\{-bx\}$ $\pi(\lambda|x) \propto \lambda \exp\{-\lambda x\} \times \lambda^{a-1} \exp\{-b\lambda\}$

hence could not derive a proper posterior.

## which parameters are U-estimable?

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on January 13, 2015 by xi'an Today (01/06) was a double epiphany in that I realised that one of my long-time beliefs about unbiased estimators did not hold. Indeed, when checking on Cross Validated, I found this question: For which distributions is there a closed-form unbiased estimator for the standard deviation? And the presentation includes the normal case for which indeed there exists an unbiased estimator of σ, namely $\frac{\Gamma(\{n-1\}/{2})}{\Gamma({n}/{2})}2^{-1/2}\sqrt{\sum_{k=1}^n(x_i-\bar{x})^2}$

which derives directly from the chi-square distribution of the sum of squares divided by σ². When thinking further about it, if a posteriori!, it is now fairly obvious given that σ is a scale parameter. Better, any power of σ can be similarly estimated in a unbiased manner, since $\left\{\sum_{k=1}^n(x_i-\bar{x})^2\right\}^\alpha \propto\sigma^\alpha\,.$

And this property extends to all location-scale models.

So how on Earth was I so convinced that there was no unbiased estimator of σ?! I think it stems from reading too quickly a result in, I think, Lehmann and Casella, result due to Peter Bickel and Erich Lehmann that states that, for a convex family of distributions F, there exists an unbiased estimator of a functional q(F) (for a sample size n large enough) if and only if q(αF+(1-α)G) is a polynomial in 0α1. Because of this, I had this [wrong!] impression that only polynomials of the natural parameters of exponential families can be estimated by unbiased estimators… Note that Bickel’s and Lehmann’s theorem does not apply to the problem here because the collection of Gaussian distributions is not convex (a mixture of Gaussians is not a Gaussian).

This leaves open the question as to which transforms of the parameter(s) are unbiasedly estimable (or U-estimable) for a given parametric family, like the normal N(μ,σ²). I checked in Lehmann’s first edition earlier today and could not find an answer, besides the definition of U-estimability. Not only the question is interesting per se but the answer could come to correct my long-going impression that unbiasedness is a rare event, i.e., that the collection of transforms of the model parameter that are U-estimable is a very small subset of the whole collection of transforms.

Posted in Kids, pictures, Statistics, University life with tags , , , , on January 11, 2015 by xi'an

## 10 Little’s simple ideas

Posted in Books, Statistics, University life with tags , , , , , , , , on July 17, 2013 by xi'an

“I still feel that too much of academic statistics values complex mathematics over elegant simplicity — it is necessary for a research paper to be complicated in order to be published.” Roderick Little, JASA, p.359

Roderick Little wrote his Fisher lecture, recently published in JASA, around ten simple ideas for statistics. Its title is “In praise of simplicity not mathematistry! Ten simple powerful ideas for the statistical scientist”. While this title is rather antagonistic, blaming mathematical statistics for the rise of mathematistry in the field (a term borrowed from Fisher, who also invented the adjective ‘Bayesian’), the paper focus on those 10 ideas and very little on why there is (would be) too much mathematics in statistics:

1. Make outcomes univariate
2. Bayes rule, for inference under an assumed model
3. Calibrated Bayes, to keep inference honest
4. Embrace well-designed simulation experiments
5. Distinguish the model/estimand, the principles of estimation, and computational methods
6. Parsimony — seek a good simple model, not the “right” model
7. Model the Inclusion/Assignment and try to make it ignorable
8. Consider dropping parts of the likelihood to reduce the modeling part
9. Potential outcomes and principal stratification for causal inferenc
10. Statistics is basically a missing data problem

“The mathematics of problems with infinite parameters is interesting, but with finite sample sizes, I would rather have a parametric model. “Mathematistry” may eschew parametric models because the asymptotic theory is too simple, but they often work well in practice.” Roderick Little, JASA, p.365

Both those rules and the illustrations that abund in the paper are reflecting upon Little’s research focus and obviously apply to his model in a fairly coherent way. However, while a mostly parametric model user myself, I fear the rejection of non-parametric techniques is far too radical. It is more and more my convinction that we cannot handle the full complexity of a realistic structure in a standard Bayesian manner and that we have to give up on the coherence and completeness goals at some point… Using non-parametrics and/or machine learning on some bits and pieces then makes sense, even though it hurts elegance and simplicity.

“However, fully Bayes inference requires detailed probability modeling, which is often a daunting task. It seems worth sacrifycing some Bayesian inferential purity if the task can be simplified.” Roderick Little, JASA, p.366

I will not discuss those ideas in detail, as some of them make complete sense to me (like Bayesian statistics laying its assumptions in the open) and others remain obscure (e.g., causality) or with limited applicability. It is overall a commendable Fisher lecture that focus on methodology and the practice of statistical science, rather than on theory. I however do not see the reason why maths should be blamed for this state of the field. Nor why mathematical statistics journals like AoS would carry some responsibility in the lack of further applicability in other fields.  Students of statistics do need a strong background in mathematics and I fear we are losing ground in this respect, at least judging by the growing difficulty in finding measure theory courses abroad for our exchange undergradutes from Paris-Dauphine. (I also find the model misspecification aspects mostly missing from this list.)