Archive for admissibility

mathematical statistics books with Bayesian chapters [incomplete book reviews]

Posted in Books, Statistics, University life with tags , , , , , , , , on July 9, 2013 by xi'an

I received (in the same box) two mathematical statistics books from CRC Press, Understanding Advanced Statistical Methods by Westfall and Henning, and Statistical Theory A Concise Introduction by Abramovich and Ritov. For review in CHANCE. While they are both decent books for teaching mathematical statistics at undergraduate borderline graduate level, I do not find enough of a novelty in them to proceed to a full review. (Given more time, I could have changed my mind about the first one.) Instead, I concentrate here on their processing of the Bayesian paradigm, which takes a wee bit more than a chapter in either of them. (And this can be done over a single métro trip!) The important following disclaimer applies: comparing both books is highly unfair in that it is only because I received them together. They do not necessarily aim at the same audience. And I did not read the whole of either of them.

First, the concise Statistical Theory  covers the topic in a fairly traditional way. It starts with a warning about the philosophical nature of priors and posteriors, which reflect beliefs rather than frequency limits (just like likelihoods, no?!). It then introduces priors with the criticism that priors are difficult to build and assess. The two classes of priors analysed in this chapter are unsurprisingly conjugate priors (which hyperparameters have to be determined or chosen or estimated in the empirical Bayes heresy [my words!, not the authors']) and “noninformative (objective) priors”.  The criticism of the flat priors is also traditional and leads to the  group invariant (Haar) measures, then to Jeffreys non-informative priors (with the apparent belief that Jeffreys only handled the univariate case). Point estimation is reduced to posterior expectations, confidence intervals to HPD regions, and testing to posterior probability ratios (with a warning about improper priors). Bayes rules make a reappearance in the following decision-theory chapter, as providers of both admissible and minimax estimators. This is it, as Bayesian techniques are not mentioned in the final “Linear Models” chapter. As a newcomer to statistics, I think I would be as bemused about Bayesian statistics as when I got my 15mn entry as a student, because here was a method that seemed to have a load of history, an inner coherence, and it was mentioned as an oddity in an otherwise purely non-Bayesian course. What good could this do to the understanding of the students?! So I would advise against getting this “token Bayesian” chapter in the book

“You are not ignorant! Prior information is what you know prior to collecting the data.” Understanding Advanced Statistical Methods (p.345)

Second, Understanding Advanced Statistical Methods offers a more intuitive entry, by justifying prior distributions as summaries of prior information. And observations as a mean to increase your knowledge about the parameter. The Bayesian chapter uses a toy but very clear survey examplew to illustrate the passage from prior to posterior distributions. And to discuss the distinction between informative and noninformative priors. (I like the “Ugly Rule of Thumb” insert, as it gives a guideline without getting too comfy about it… E.g., using a 90% credible interval is good enough on p.354.) Conjugate priors are mentioned as a result of past computational limitations and simulation is hailed as a highly natural tool for analysing posterior distributions. Yay! A small section discusses the purpose of vague priors without getting much into details and suggests to avoid improper priors by using “distributions with extremely large variance”, a concept we dismissed in Bayesian Core! For how large is “extremely large”?!

“You may end up being surprised to learn in later chapters (..) that, with classical methods, you simply cannot perform the types of analyses shown in this section (…) And that’s the answer to the question, “What good is Bayes?””Understanding Advanced Statistical Methods (p.345)

Then comes the really appreciable part, a section entitled “What good is Bayes?”—it actually reads “What Good is Bayes?” (p.359), leading to a private if grammatically poor joke since I.J. Good was one of the first modern Bayesians, working with Turing at Bletchley Park…—  The authors simply skip the philosophical arguments to give the reader a showcase of examples where the wealth of the Bayesian toolbox: logistic regression, VaR (Value at Risk), stock prices, drug profit prediction. Concluding with arguments in favour of the frequentist methods: (a) not requiring priors, (b) easier with generic distributifrequentistons, (c) easier to understand with simulation, and (d) easier to validate with validation. I do not mean to get into a debate about those points as my own point is that the authors are taking a certain stand about the pros and cons of the frequentist/Bayesian approaches and that they are making their readers aware of it. (Note that the Bayesian chapter comes before the frequentist chapter!) A further section is “Comparing the Bayesian and frequentist paradigms?” (p.384), again with a certain frequentist slant, but again making the distinctions and similarities quite clear to the reader. Of course, there is very little (if any) about Bayesian approaches in the next chapters but this is somehow coherent with the authors’ perspective. Once more, a perspective that is well-spelled and comprehensible for the reader. Even the novice statistician. In that sense, having a Bayesian chapter inside a general theory book makes sense.  (The second book has a rather detailed website, by the way! Even though handling simulations in Excel and drawing graphs in SAS could be dangerous to your health…)

beware, nefarious Bayesians threaten to take over frequentism using loss functions as Trojan horses!

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on November 12, 2012 by xi'an

“It is not a coincidence that textbooks written by Bayesian statisticians extol the virtue of the decision-theoretic perspective and then proceed to present the Bayesian approach as its natural extension.” (p.19)

“According to some Bayesians (see Robert, 2007), the risk function does represent a legitimate frequentist error because it is derived by taking expectations with respect to [the sampling density]. This argument is misleading for several reasons.” (p.18)

During my R exam, I read the recent arXiv posting by Aris Spanos on why “the decision theoretic perspective misrepresents the frequentist viewpoint”. The paper is entitled “Why the Decision Theoretic Perspective Misrepresents Frequentist Inference: ‘Nuts and Bolts’ vs. Learning from Data” and I found it at the very least puzzling…. The main theme is the one caricatured in the title of this post, namely that the decision-theoretic analysis of frequentist procedures is a trick brought by Bayesians to justify their own procedures. The fundamental argument behind this perspective is that decision theory operates in a “for all θ” referential while frequentist inference (in Spanos’ universe) is only concerned by one θ, the true value of the parameter. (Incidentally, the “nuts and bolt” refers to the only case when a decision-theoretic approach is relevant from a frequentist viewpoint, namely in factory quality control sampling.)

“The notions of a risk function and admissibility are inappropriate for frequentist inference because they do not represent legitimate error probabilities.” (p.3)

“An important dimension of frequentist inference that has not been adequately appreciated in the statistics literature concerns its objectives and underlying reasoning.” (p.10)

“The factual nature of frequentist reasoning in estimation also brings out the impertinence of the notion of admissibility stemming from its reliance on the quantifier ‘for all’.” (p.13)

One strange feature of the paper is that Aris Spanos seems to appropriate for himself the notion of frequentism, rejecting the choices made by (what I would call frequentist) pioneers like Wald, Neyman, “Lehmann and LeCam [sic]“, Stein. Apart from Fisher—and the paper is strongly grounded in neo-Fisherian revivalism—, the only frequentists seemingly finding grace in the eyes of the author are George Box, David Cox, and George Tiao. (The references are mostly to textbooks, incidentally.) Modern authors that clearly qualify as frequentists like Bickel, Donoho, Johnstone, or, to mention the French school, e.g., Birgé, Massart, Picard, Tsybakov, none of whom can be suspected of Bayesian inclinations!, do not appear either as satisfying those narrow tenets of frequentism. Furthermore, the concept of frequentist inference is never clearly defined within the paper. As in the above quote, the notion of “legitimate error probabilities” pops up repeatedly (15 times) within the whole manifesto without being explicitely defined. (The closest to a definition is found on page 17, where the significance level and the p-value are found to be legitimate.) Aris Spanos even rejects what I would call the von Mises basis of frequentism: “contrary to Bayesian claims, those error probabilities have nothing to to do with the temporal or the physical dimension of the long-run metaphor associated with repeated samples” (p.17), namely that a statistical  procedure cannot be evaluated on its long term performance… Continue reading

Back from Philly

Posted in R, Statistics, Travel, University life with tags , , , , , , , , , on December 21, 2010 by xi'an

The conference in honour of Larry Brown was quite exciting, with lots of old friends gathered in Philadelphia and lots of great talks either recollecting major works of Larry and coauthors or presenting fairly interesting new works. Unsurprisingly, a large chunk of the talks was about admissibility and minimaxity, with John Hartigan starting the day re-reading Larry masterpiece 1971 paper linking admissibility and recurrence of associated processes, a paper I always had trouble studying because of both its depth and its breadth! Bill Strawderman presented a new if classical minimaxity result on matrix estimation and Anirban DasGupta some large dimension consistency results where the choice of the distance (total variation versus Kullback deviance) was irrelevant. Ed George and Susie Bayarri both presented their recent work on g-priors and their generalisation, which directly relate to our recent paper on that topic. On the afternoon, Holger Dette showed some impressive mathematics based on Elfving’s representation and used in building optimal designs. I particularly appreciated the results of a joint work with Larry presented by Robert Wolpert where they classified all Markov stationary infinitely divisible time-reversible integer-valued processes. It produced a surprisingly small list of four cases, two being trivial.. The final talk of the day was about homology, which sounded a priori rebutting, but Robert Adler made it extremely entertaining, so much that I even failed to resent the powerpoint tricks! The next morning, Mark Low gave a very emotional but also quite illuminating about the first results he got during his PhD thesis at Cornell (completing the thesis when I was using Larry’s office!). Brenda McGibbon went back to the three truncated Poisson papers she wrote with Ian Johnstone (via gruesome 13 hour bus rides from Montréal to Ithaca!) and produced an illuminating explanation of the maths at work for moving from the Gaussian to the Poisson case in a most pedagogical and enjoyable fashion. Larry Wasserman explained the concepts at work behind the lasso for graphs, entertaining us with witty acronyms on the side!, and leaving out about 3/4 of his slides! (The research group involved in this project produced an R package called huge.) Joe Eaton ended up the morning with a very interesting result showing that using the right Haar measure as a prior leads to a matching prior, then showing why the consequences of the result are limited by invariance itself. Unfortunately, it was then time for me to leave and I will miss (in both meanings of the term) the other half of the talks. Especially missing Steve Fienberg’s talk for the third time in three weeks! Again, what I appreciated most during those two days (besides the fact that we were all reunited on the very day of Larry’s birthday!) was the pain most speakers went to to expose older results in a most synthetic and intuitive manner… I also got new ideas about generalising our parallel computing paper for random walk Metropolis-Hastings algorithms and for optimising across permutation transforms.

New arXiv postings

Posted in Statistics with tags , , , , , on March 17, 2010 by xi'an

No time today to read those as I am preparing for the course this afternoon but there are two interesting new entries on arXiv, one by Madeleine Thompson and Radford Neal on covariance-adaptive slice sampling

We describe two slice sampling methods for taking multivariate steps using the crumb framework. These methods use the gradients at rejected proposals to adapt to the local curvature of the log-density surface, a technique that can produce much better proposals when parameters are highly correlated. We evaluate our methods on four distributions and compare their performance to that of a non-adaptive slice sampling method and a Metropolis method. The adaptive methods perform favourably on low-dimensional target distributions with highly-correlated parameters.

and one by Brian Shea and Galin Jones on consequential evaluation of default priors

We consider evaluating improper priors in a formal Bayes setting according to consequences of their use. This approach bridges the frequentist concern of evaluating a decision rule and the Bayesian concern of evaluating a prior. We generalize Eaton’s method, which exploits a connection between admissibility and a Markov chain defined by the sampling distribution and posterior. This generalization leads us to introduce the idea of \varPhi-admissibility, itself a generalization of strong admissibility. To illustrate the method, we establish \varPhi-admissibility conditions for a family of priors on multivariate normal means.

There have been very few extensions of Eaton’s (1992) great characterisation of admissible Bayes procedures so this sounds quite exciting!


Get every new post delivered to your Inbox.

Join 558 other followers