On Friday, I am giving a talk on ABC at Université Laval, in the old city of Québec. While on my way to the 14w5125 workshop on scalable Bayesian computation at BIRS, Banff. I have not visited Laval since the late 1980′s (!) even though my last trip to Québec (the city) was in 2009, when François Perron took me there for ice-climbing and skiing after a seminar in Montréal… (This trip, I will not stay long enough in Québec, alas. Keeping my free day-off for another attempt at ice-climbing near Banff.) Here are slides I have used often in the past year, but this may be the last occurrence as we are completing a paper on the topic with my friends from Montpellier.
Archive for empirical Bayes methods
This was a most busy and profitable week in Warwick as, in addition to meeting with local researchers and students on a wide range of questions and projects, giving an extended seminar to MASDOC students, attending as many seminars as humanly possible (!), and preparing a 5k race by running in the Warwickshire countryside (in the dark and in the rain), I received the visits of Kerrie Mengersen, Judith Rousseau and Jean-Michel Marin, with whom I made some progress on papers we are writing together. In particular, Jean-Michel and I wrote the skeleton of a paper we (still) plan to submit to COLT 2014 next week. And Judith, Kerrie and I drafted new if paradoxical aconnections between empirical likelihood and model selection. Jean-Michel and Judith also gave talks at the CRiSM seminar, Jean-Michel presenting the latest developments on the convergence of our AMIS algorithm, Judith summarising several papers on the analysis of empirical Bayes methods in non-parametric settings.
(Following my earlier discussion of his paper, Xiaoquan Wen sent me this detailed reply.)
I think it is appropriate to start my response to your comments by introducing a little bit of the background information on my research interest and the project itself: I consider myself as an applied statistician, not a theorist, and I am interested in developing theoretically sound and computationally efficient methods to solve practical problems. The FDR project originated from a practical application in genomics involving hypothesis testing. The details of this particular application can be found in this published paper, and the simulations in the manuscript are also designed for a similar context. In this application, the null model is trivially defined, however there exist finitely many alternative scenarios for each test. We proposed a Bayesian solution that handles this complex setting quite nicely: in brief, we chose to model each possible alternative scenario parametrically, and by taking advantage of Bayesian model averaging, Bayes factor naturally ended up as our test statistic. We had no problem in demonstrating the resulting Bayes factor is much more powerful than the existing approaches, even accounting for the prior (mis-)modeling for Bayes factors. However, in this genomics application, there are potentially tens of thousands of tests need to be simultaneously performed, and FDR control becomes necessary and challenging. Continue reading
Today, I attended a “miniworkshop” on Bayesian nonparametrics in Paris (Université René Descartes, now located in an intensely renovated area near the Grands Moulins de Paris), in connection with one of the ANR research grants that support my research, BANHDITS in the present case. Reflecting incidentally that it was the third Monday in a row that I was at a meeting listening to talks (after Hong Kong and Newcastle)… The talks were as follows
9h30 – 10h15 : Dominique Bontemps/Sébastien Gadat
Bayesian point of view on the Shape Invariant Model
10h15 – 11h : Pierpaolo De Blasi
Posterior consistency of nonparametric location-scale mixtures for multivariate density estimation
11h30 – 12h15 : Jean-Bernard Salomond
General posterior contraction rate Theorem in inverse problems.
12h15 – 13h : Eduard Belitser
On lower bounds for posterior consistency (I)
14h30 – 15h15 : Eduard Belitser
On lower bounds for posterior consistency (II)
15h15 – 16h : Judith Rousseau
Posterior concentration rates for empirical Bayes approaches
16h – 16h45 : Elisabeth Gassiat
Nonparametric HMM models
While most talks were focussing on contraction and consistency rates, hence far from my current interests, both talk by Judith and Elisabeth held more appeal to me. Judith gave conditions for an empirical Bayes nonparametric modelling to be consistent, with examples taken from Peter Green’s mixtures of Dirichlet, and Elisabeth concluded with a very generic result on the consistent estimation of a finite hidden Markov model. (Incidentally, the same BANHDITS grant will also support the satellite meeting on Bayesian non-parametric at MCMSki IV on Jan. 09.)
I received (in the same box) two mathematical statistics books from CRC Press, Understanding Advanced Statistical Methods by Westfall and Henning, and Statistical Theory A Concise Introduction by Abramovich and Ritov. For review in CHANCE. While they are both decent books for teaching mathematical statistics at undergraduate borderline graduate level, I do not find enough of a novelty in them to proceed to a full review. (Given more time, I could have changed my mind about the first one.) Instead, I concentrate here on their processing of the Bayesian paradigm, which takes a wee bit more than a chapter in either of them. (And this can be done over a single métro trip!) The important following disclaimer applies: comparing both books is highly unfair in that it is only because I received them together. They do not necessarily aim at the same audience. And I did not read the whole of either of them.
First, the concise Statistical Theory covers the topic in a fairly traditional way. It starts with a warning about the philosophical nature of priors and posteriors, which reflect beliefs rather than frequency limits (just like likelihoods, no?!). It then introduces priors with the criticism that priors are difficult to build and assess. The two classes of priors analysed in this chapter are unsurprisingly conjugate priors (which hyperparameters have to be determined or chosen or estimated in the empirical Bayes heresy [my words!, not the authors']) and “noninformative (objective) priors”. The criticism of the flat priors is also traditional and leads to the group invariant (Haar) measures, then to Jeffreys non-informative priors (with the apparent belief that Jeffreys only handled the univariate case). Point estimation is reduced to posterior expectations, confidence intervals to HPD regions, and testing to posterior probability ratios (with a warning about improper priors). Bayes rules make a reappearance in the following decision-theory chapter, as providers of both admissible and minimax estimators. This is it, as Bayesian techniques are not mentioned in the final “Linear Models” chapter. As a newcomer to statistics, I think I would be as bemused about Bayesian statistics as when I got my 15mn entry as a student, because here was a method that seemed to have a load of history, an inner coherence, and it was mentioned as an oddity in an otherwise purely non-Bayesian course. What good could this do to the understanding of the students?! So I would advise against getting this “token Bayesian” chapter in the book…
“You are not ignorant! Prior information is what you know prior to collecting the data.” Understanding Advanced Statistical Methods (p.345)
Second, Understanding Advanced Statistical Methods offers a more intuitive entry, by justifying prior distributions as summaries of prior information. And observations as a mean to increase your knowledge about the parameter. The Bayesian chapter uses a toy but very clear survey examplew to illustrate the passage from prior to posterior distributions. And to discuss the distinction between informative and noninformative priors. (I like the “Ugly Rule of Thumb” insert, as it gives a guideline without getting too comfy about it… E.g., using a 90% credible interval is good enough on p.354.) Conjugate priors are mentioned as a result of past computational limitations and simulation is hailed as a highly natural tool for analysing posterior distributions. Yay! A small section discusses the purpose of vague priors without getting much into details and suggests to avoid improper priors by using “distributions with extremely large variance”, a concept we dismissed in Bayesian Core! For how large is “extremely large”?!
“You may end up being surprised to learn in later chapters (..) that, with classical methods, you simply cannot perform the types of analyses shown in this section (…) And that’s the answer to the question, “What good is Bayes?”"Understanding Advanced Statistical Methods (p.345)
Then comes the really appreciable part, a section entitled “What good is Bayes?”—it actually reads “What Good is Bayes?” (p.359), leading to a private if grammatically poor joke since I.J. Good was one of the first modern Bayesians, working with Turing at Bletchley Park…— The authors simply skip the philosophical arguments to give the reader a showcase of examples where the wealth of the Bayesian toolbox: logistic regression, VaR (Value at Risk), stock prices, drug profit prediction. Concluding with arguments in favour of the frequentist methods: (a) not requiring priors, (b) easier with generic distributifrequentistons, (c) easier to understand with simulation, and (d) easier to validate with validation. I do not mean to get into a debate about those points as my own point is that the authors are taking a certain stand about the pros and cons of the frequentist/Bayesian approaches and that they are making their readers aware of it. (Note that the Bayesian chapter comes before the frequentist chapter!) A further section is “Comparing the Bayesian and frequentist paradigms?” (p.384), again with a certain frequentist slant, but again making the distinctions and similarities quite clear to the reader. Of course, there is very little (if any) about Bayesian approaches in the next chapters but this is somehow coherent with the authors’ perspective. Once more, a perspective that is well-spelled and comprehensible for the reader. Even the novice statistician. In that sense, having a Bayesian chapter inside a general theory book makes sense. (The second book has a rather detailed website, by the way! Even though handling simulations in Excel and drawing graphs in SAS could be dangerous to your health…)