Archive for mathematical statistics

10 Little’s simple ideas

Posted in Books, Statistics, University life with tags , , , , , , , , on July 17, 2013 by xi'an

“I still feel that too much of academic statistics values complex mathematics over elegant simplicity — it is necessary for a research paper to be complicated in order to be published.” Roderick Little, JASA, p.359

Roderick Little wrote his Fisher lecture, recently published in JASA, around ten simple ideas for statistics. Its title is “In praise of simplicity not mathematistry! Ten simple powerful ideas for the statistical scientist”. While this title is rather antagonistic, blaming mathematical statistics for the rise of mathematistry in the field (a term borrowed from Fisher, who also invented the adjective ‘Bayesian’), the paper focus on those 10 ideas and very little on why there is (would be) too much mathematics in statistics:

  1. Make outcomes univariate
  2. Bayes rule, for inference under an assumed model
  3. Calibrated Bayes, to keep inference honest
  4. Embrace well-designed simulation experiments
  5. Distinguish the model/estimand, the principles of estimation, and computational methods
  6. Parsimony — seek a good simple model, not the “right” model
  7. Model the Inclusion/Assignment and try to make it ignorable
  8. Consider dropping parts of the likelihood to reduce the modeling part
  9. Potential outcomes and principal stratification for causal inferenc
  10. Statistics is basically a missing data problem

“The mathematics of problems with infinite parameters is interesting, but with finite sample sizes, I would rather have a parametric model. “Mathematistry” may eschew parametric models because the asymptotic theory is too simple, but they often work well in practice.” Roderick Little, JASA, p.365

Both those rules and the illustrations that abund in the paper are reflecting upon Little’s research focus and obviously apply to his model in a fairly coherent way. However, while a mostly parametric model user myself, I fear the rejection of non-parametric techniques is far too radical. It is more and more my convinction that we cannot handle the full complexity of a realistic structure in a standard Bayesian manner and that we have to give up on the coherence and completeness goals at some point… Using non-parametrics and/or machine learning on some bits and pieces then makes sense, even though it hurts elegance and simplicity.

“However, fully Bayes inference requires detailed probability modeling, which is often a daunting task. It seems worth sacrifycing some Bayesian inferential purity if the task can be simplified.” Roderick Little, JASA, p.366

I will not discuss those ideas in detail, as some of them make complete sense to me (like Bayesian statistics laying its assumptions in the open) and others remain obscure (e.g., causality) or with limited applicability. It is overall a commendable Fisher lecture that focus on methodology and the practice of statistical science, rather than on theory. I however do not see the reason why maths should be blamed for this state of the field. Nor why mathematical statistics journals like AoS would carry some responsibility in the lack of further applicability in other fields.  Students of statistics do need a strong background in mathematics and I fear we are losing ground in this respect, at least judging by the growing difficulty in finding measure theory courses abroad for our exchange undergradutes from Paris-Dauphine. (I also find the model misspecification aspects mostly missing from this list.)

mathematical statistics books with Bayesian chapters [incomplete book reviews]

Posted in Statistics, University life, Books with tags , , , , , , , , on July 9, 2013 by xi'an

I received (in the same box) two mathematical statistics books from CRC Press, Understanding Advanced Statistical Methods by Westfall and Henning, and Statistical Theory A Concise Introduction by Abramovich and Ritov. For review in CHANCE. While they are both decent books for teaching mathematical statistics at undergraduate borderline graduate level, I do not find enough of a novelty in them to proceed to a full review. (Given more time, I could have changed my mind about the first one.) Instead, I concentrate here on their processing of the Bayesian paradigm, which takes a wee bit more than a chapter in either of them. (And this can be done over a single métro trip!) The important following disclaimer applies: comparing both books is highly unfair in that it is only because I received them together. They do not necessarily aim at the same audience. And I did not read the whole of either of them.

First, the concise Statistical Theory  covers the topic in a fairly traditional way. It starts with a warning about the philosophical nature of priors and posteriors, which reflect beliefs rather than frequency limits (just like likelihoods, no?!). It then introduces priors with the criticism that priors are difficult to build and assess. The two classes of priors analysed in this chapter are unsurprisingly conjugate priors (which hyperparameters have to be determined or chosen or estimated in the empirical Bayes heresy [my words!, not the authors’]) and “noninformative (objective) priors”.  The criticism of the flat priors is also traditional and leads to the  group invariant (Haar) measures, then to Jeffreys non-informative priors (with the apparent belief that Jeffreys only handled the univariate case). Point estimation is reduced to posterior expectations, confidence intervals to HPD regions, and testing to posterior probability ratios (with a warning about improper priors). Bayes rules make a reappearance in the following decision-theory chapter, as providers of both admissible and minimax estimators. This is it, as Bayesian techniques are not mentioned in the final “Linear Models” chapter. As a newcomer to statistics, I think I would be as bemused about Bayesian statistics as when I got my 15mn entry as a student, because here was a method that seemed to have a load of history, an inner coherence, and it was mentioned as an oddity in an otherwise purely non-Bayesian course. What good could this do to the understanding of the students?! So I would advise against getting this “token Bayesian” chapter in the book

“You are not ignorant! Prior information is what you know prior to collecting the data.” Understanding Advanced Statistical Methods (p.345)

Second, Understanding Advanced Statistical Methods offers a more intuitive entry, by justifying prior distributions as summaries of prior information. And observations as a mean to increase your knowledge about the parameter. The Bayesian chapter uses a toy but very clear survey examplew to illustrate the passage from prior to posterior distributions. And to discuss the distinction between informative and noninformative priors. (I like the “Ugly Rule of Thumb” insert, as it gives a guideline without getting too comfy about it… E.g., using a 90% credible interval is good enough on p.354.) Conjugate priors are mentioned as a result of past computational limitations and simulation is hailed as a highly natural tool for analysing posterior distributions. Yay! A small section discusses the purpose of vague priors without getting much into details and suggests to avoid improper priors by using “distributions with extremely large variance”, a concept we dismissed in Bayesian Core! For how large is “extremely large”?!

“You may end up being surprised to learn in later chapters (..) that, with classical methods, you simply cannot perform the types of analyses shown in this section (…) And that’s the answer to the question, “What good is Bayes?””Understanding Advanced Statistical Methods (p.345)

Then comes the really appreciable part, a section entitled “What good is Bayes?”—it actually reads “What Good is Bayes?” (p.359), leading to a private if grammatically poor joke since I.J. Good was one of the first modern Bayesians, working with Turing at Bletchley Park…—  The authors simply skip the philosophical arguments to give the reader a showcase of examples where the wealth of the Bayesian toolbox: logistic regression, VaR (Value at Risk), stock prices, drug profit prediction. Concluding with arguments in favour of the frequentist methods: (a) not requiring priors, (b) easier with generic distributifrequentistons, (c) easier to understand with simulation, and (d) easier to validate with validation. I do not mean to get into a debate about those points as my own point is that the authors are taking a certain stand about the pros and cons of the frequentist/Bayesian approaches and that they are making their readers aware of it. (Note that the Bayesian chapter comes before the frequentist chapter!) A further section is “Comparing the Bayesian and frequentist paradigms?” (p.384), again with a certain frequentist slant, but again making the distinctions and similarities quite clear to the reader. Of course, there is very little (if any) about Bayesian approaches in the next chapters but this is somehow coherent with the authors’ perspective. Once more, a perspective that is well-spelled and comprehensible for the reader. Even the novice statistician. In that sense, having a Bayesian chapter inside a general theory book makes sense.  (The second book has a rather detailed website, by the way! Even though handling simulations in Excel and drawing graphs in SAS could be dangerous to your health…)

n-1,n,n+1, who [should] care?!

Posted in Statistics, University life with tags , , on February 5, 2013 by xi'an

Terry Speed wrote a column in the latest IMS Bulletin (the one I received a week ago) about the choice of the denominator in the variance estimator. That is, should s² involve n (number of observations), n-1 (degrees of freedom), n+1 or anything else in its denominator? I find the question more interesting than the answer (sorry, Terry!) as it demonstrates quite forcibly that there is not a single possible choice for this estimator of the variance but that instead the “optimal” estimator is determined by the choice of the optimality criterion: this makes for a wonderful (if rather formal) playground for a class on decision theoretic statistics. And I often use it on my students. Non-Bayesian mathematical statistics courses often give the impression that there is a natural (single) estimator, when this estimator is based on an implicit choice of an optimality criterion. (This issue is illustrated in the books of Chang and of Vasishth and Broe I discussed earlier. As well as by the Stein effect, of course.) I thus deem it worthwhile to impress upon all users of statistics that there is no such single optimal choice, that unbiasedness is not a compulsory property—just as well since most parameters cannot be estimated in an unbiased manner!—, and that there is room for a subjective choice of a “best” estimator, as paradoxical as it may sound to non-statisticians.

A whistle-stop tour of statistics

Posted in Books, pictures, Statistics with tags , , , , on March 9, 2012 by xi'an

In a consequent package I got from CRC Press last month for CHANCE, there was this short book by Brian S. Everitt, A whistle-stop tour of statistics… Nice cover! The book is like an introductory undergraduate statistics course, except that it is much much terser and shorter, using 200 pages in A5 format with plenty of pictures. (It could also have been called a primer or a guidebook.) The table of contents is as follows

Some basics and describing data
Probability
Estimation
Inference
Analysis of variance models
Linear regression models
Logistic regression and the generalized linear model
Survival analysis
Longitudinal data and their analysis
Multivariate data and their analysis

There is nothing fundamentally wrong with the book, except that… I cannot fathom its purpose! Nor its readership. Again, it is way too short and terse to be used in an undergraduate course or for self-study. And it does not bring a new light on those standard topics when compared with most of introductory statistics books, being mostly traditional (even though it briefly mentions Bayesian inference on pp. 88-89). While the book is itself a summary of statistical methodology, it still finds room for a summary of the current notions at the end of each chapter. So I thus remain completely puzzled by the point in publishing A whistle-stop tour of statistics..!

Follow

Get every new post delivered to your Inbox.

Join 721 other followers