## the signal and the noise

Posted in Books, Statistics with tags , , , , , , , , , , , on February 27, 2013 by xi'an

It took me a while to get Nate Silver’s the signal and the noise: why so many predictions fall – but some don’t (hereafter s&n) and another while to read it (blame A Memory of Light!).

“Bayes and Price are telling Hume, don’t blame nature because you are too daft to understand it.” s&n, p.242

I find s&n highly interesting and it is rather refreshing to see the Bayesian approach so passionately promoted by a former poker player, as betting and Dutch book arguments have often been used as argument in favour of this approach. While it works well for some illustrations in the book, like poker and the stock market, as well as political polls and sports, I prefer more decision theoretic motivations for topics like weather prediction, sudden epidemics, global warming or terrorism. Of course, this passionate aspect makes s&n open to criticisms, like this one by Marcus and Davies in The New Yorker about seeing everything through the Bayesian lenses. The chapter on Bayes and Bayes’ theorem (Chapter 8) is a wee caricaturesque in this regard. Indeed, Silver sees too much in Bayes’ Essay, to the point of mistakenly attributing to Bayes a discussion of Hume’s sunrise problem. (The only remark is made in the Appendix, which was written by Price—like possibly the whole of the Essay!—, and  P.S. Laplace is the one who applied Bayesian reasoning to the problem, leading to Laplace’s succession rule.) The criticisms of frequentism are also slightly over-the-levee: they are mostly directed at inadequate models that a Bayesian analysis would similarly process in the wrong way. (Some critics argue on the opposite that Bayesian analysis is too much dependent on the model being “right”! Or on the availability of a fully-specified  model.) Seeing frequentism as restricted to “collecting data among just a sample of the population rather than the whole population” (p.252) is certainly not presenting a broad coverage of frequentism.

“Prediction serves a very central role in hypothesis testing, for instance, and therefore in all of science.” s&n, p.230

The book is written in a fairly enjoyable style, highly personal (no harm with that) and apart from superlativising (!) everyone making a relevant appearance—which seems the highest common denominator of all those pop’sci’ books I end up reviewing so very often!, maybe this is something like Rule #1 in Scientific Writing 101 courses: “makes the scientists sound real, turn’em into real people”—, I find it rather well-organised as it brings the reader from facts (prediction usually does poorly) to the possibility of higher quality prediction (by acknowledging prior information, accepting uncertainty, using all items of information available, further accepting uncertainty, &tc.). I am not sure the reader is the wiser by the end of the book on how one should improve one’s prediction tools, but there is a least a warning about the low quality of most predictions and predictive tools that should linger in the reader’s ears…. I enjoyed very much the chapter on chess, esp. the core about Kasparov’s misreading the computer reasons for a poor move (no further spoiler!), although I felt it was not much connected to the rest of the book.

In his review, Larry Wasserman argues that the defence Silver makes of his procedure is more frequentist than Bayesian. Because he uses calibration and long-term performances. Well… Having good calibration properties does not mean the procedure is not Bayesian or frequentist, simply that it is making efficient use of the available information. Anyway, I agree (!) with Larry on the point that Silver somehow “confuses “Bayesian inference” with “using Bayes’ theorem”. Or puts too much meaning in the use of Bayes’ theorem, not unlike the editors of Science & Vie a few months ago. To push Larry’s controversial statement a wee further, I would even wonder whether the book has anything to do about inference. Indeed, in the end, I find s&n rather uninformative about statistical modelling and even more (or less!) about model checking. The only “statistical” model that is truly discussed over the book is the power law distribution, applied to earthquakes and terrorist attack fatalities. This is not an helpful model in that (a) it does not explain anything, as it does not make use of covariates or side information, and (b) it has no predictive power, especially in the tails.  On the first point, concluding that Israel’s approach to counter-terrorism is successful because it “is the only country that has been able to bend” the power-law curve (p.442) sounds rather hasty. I’d like to see the same picture for Iraq, say. Actually, I found one in this arXiv paper. And it looks about the same for Afghanistan (Fig.4). On the second point, the modelling is poor in handling extreme values (which are the ones of interest in both cases) and cannot face change-points or lacks of stationary, an issue not sufficiently covered in s&n in my opinion. The difficulty with modelling volatile concepts like the stock market, the next presidential election or the move of your poker opponents is that there is no physical, immutable, law at play. Things can change from one instant to the next. Unpredictably. Esp. in the tails.

There are plenty of graphs in s&n, which is great, but not all of them are at the Tufte quality level. For instance, Figure 11-1 about the “average time U.S. common stock was held” contains six pie charts corresponding to six decades with the average time and a percentage which could be how long compared with the 1950s a stock was held. The graph is not mentioned in the text. (I will not mention Figure 8-2!) I also spotted a minuscule typo (`probabalistic’) on Figure 10-2A.

Maybe one last and highly personal remark about the chapter on poker (feel free to skip!): while I am a very poor card player, I do not mind playing cards (and loosing) with my kids. However, I simply do not understand the rationale of playing poker. If there is no money at stake, the game does not seem to make sense since every player can keep bluffing until the end of time. And if there is money at stake, I find the whole notion unethical. This is a zero sum game, so money comes from someone else’s pocket (or more likely someone else’s retirement plan or someone else’s kids college savings plan). Not much difference with the way the stock market behaves nowadays… (Incidentally, this chapter did not discuss at all the performances of computer poker programs, unexpectedly, as the number of possibilities is very small and they should thus be fairly efficient.)

## the BUGS Book [guest post]

Posted in Books, R, Statistics with tags , , , , , , , , , , on February 25, 2013 by xi'an

(My colleague Jean-Louis Fouley, now at I3M, Montpellier, kindly agreed to write a review on the BUGS book for CHANCE. Here is the review, en avant-première! Watch out, it is fairly long and exhaustive! References will be available in the published version. The additions of book covers with BUGS in the title and of the corresponding Amazon links are mine!)

If a book has ever been so much desired in the world of statistics, it is for sure this one. Many people have been expecting it for more than 20 years ever since the WinBUGS software has been in use. Therefore, the tens of thousands of users of WinBUGS are indebted to the leading team of the BUGS project (D Lunn, C Jackson, N Best, A Thomas and D Spiegelhalter) for having eventually succeeded in finalizing the writing of this book and for making sure that the long-held expectations are not dashed.

As well explained in the Preface, the BUGS project initiated at Cambridge was a very ambitious one and at the forefront of the MCMC movement that revolutionized the development of Bayesian statistics in the early 90’s after the pioneering publication of Gelfand and Smith on Gibbs sampling.

This book comes out after several textbooks have already been published in the area of computational Bayesian statistics using BUGS and/or R (Gelman and Hill, 2007; Marin and Robert, 2007; Ntzoufras, 2009; Congdon, 2003, 2005, 2006, 2010; Kéry, 2010; Kéry and Schaub, 2011 and others). It is neither a theoretical book on foundations of Bayesian statistics (e.g. Bernardo and Smith, 1994; Robert, 2001) nor an academic textbook on Bayesian inference (Gelman et al, 2004, Carlin and Louis, 2008). Instead, it reflects very well the aims and spirit of the BUGS project and is meant to be a manual “for anyone who would like to apply Bayesian methods to real-world problems”.

In spite of its appearance, the book is not elementary. On the contrary, it addresses most of the critical issues faced by statisticians who want to apply Bayesian statistics in a clever and autonomous manner. Although very dense, its typical fluid British style of exposition based on real examples and simple arguments helps the reader to digest without too much pain such ingredients as regression and hierarchical models, model checking and comparison and all kinds of more sophisticated modelling approaches (spatial, mixture, time series, non linear with differential equations, non parametric, etc…).

The book consists of twelve chapters and three appendices specifically devoted to BUGS (A: syntax; B: functions and C: distributions) which are very helpful for practitioners. The book is illustrated with numerous examples. The exercises are well presented and explained, and the corresponding code is made available on a web site. Read more »

## on using the data twice…

Posted in Books, Statistics, University life with tags , , , , , , , , on January 13, 2012 by xi'an

As I was writing my next column for CHANCE, I decided I will include a methodology box about “using the data twice”. Here is the draft. (The second part is reproduced verbatim from an earlier post on Error and Inference.)

Several aspects of the books covered in this CHANCE review [i.e., Bayesian ideas and data analysis, and Bayesian modeling using WinBUGS] face the problem of “using the data twice”. What does that mean? Nothing really precise, actually. The accusation of “using the data twice” found in the Bayesian literature can be thrown at most procedures exploiting the Bayesian machinery without actually being Bayesian, i.e.~which cannot be derived from the posterior distribution. For instance, the integrated likelihood approach in Murray Aitkin’s Statistical Inference avoids the difficulties related with improper priors πi by first using the data x to construct (proper) posteriors πii|x) and then secondly using the data in a Bayes factor

$\int_{\Theta_1}f_1(x|\theta_1) \pi_1(\theta_1|x)\,\text{d}\theta_1\bigg/ \int_{\Theta_2}f_2(x|\theta_2)\pi_2(\theta_2|x)\,\text{d}\theta_2$

as if the posteriors were priors. This obviously solves the improperty difficulty (see. e.g., The Bayesian Choice), but it creates a statistical procedure outside the Bayesian domain, hence requiring a separate validation since the usual properties of Bayesian procedures do not apply. Similarly, the whole empirical Bayes approach falls under this category, even though some empirical Bayes procedures are asymptotically convergent. The pseudo-marginal likelihood of Geisser and Eddy (1979), used in  Bayesian ideas and data analysis, is defined by

$\hat m(x) = \prod_{i=1}^n f_i(x_i|x_{-i})$

through the marginal posterior likelihoods. While it also allows for improper priors, it does use the same data in each term of the product and, again, it is not a Bayesian procedure.

Once again, from first principles, a Bayesian approach should use the data only once, namely when constructing the posterior distribution on every unknown component of the model(s).  Based on this all-encompassing posterior, all inferential aspects should be the consequences of a sequence of decision-theoretic steps in order to select optimal procedures. This is the ideal setting while, in practice,  relying on a sequence of posterior distributions is often necessary, each posterior being a consequence of earlier decisions, which makes it the result of a multiple (improper) use of the data… For instance, the process of Bayesian variable selection is on principle clean from the sin of “using the data twice”: one simply computes the posterior probability of each of the variable subsets and this is over. However, in a case involving many (many) variables, there are two difficulties: one is about building the prior distributions for all possible models, a task that needs to be automatised to some extent; another is about exploring the set of potential models. First, ressorting to projection priors as in the intrinsic solution of Pèrez and Berger (2002, Biometrika, a much valuable article!), while unavoidable and a “least worst” solution, means switching priors/posteriors based on earlier acceptances/rejections, i.e. on the data. Second, the path of models truly explored by a computational algorithm [which will be a minuscule subset of the set of all models] will depend on the models rejected so far, either when relying on a stepwise exploration or when using a random walk MCMC algorithm. Although this is not crystal clear (there is actually plenty of room for supporting the opposite view!), it could be argued that the data is thus used several times in this process…

## Another handbook chapter

Posted in Books, Statistics with tags , , , , , on February 11, 2010 by xi'an

As I have received over the past semester half a dozen requests for contributing chapters in different handbooks, I wrote several rather similar introductions to Bayesian statistics and/or to computational statistics. Here is one for an Handbook of Statistical Systems Biology edited by D. Balding, M. Stumpf, and M. Girolami, to be published by Wiley. It is mostly inspired from the second chapter of Bayesian Core so it is not particularly novel. If I find some extra time within the coming months, I will also include a section on nonparametric Bayes… Before, I also have to write a revised edition to my chapter Bayesian Computational Methods in the Handbook of Computational Statistics (selling at an outrageous price, like most handbooks!), edited by J. Gentle, W. Härdle and Y. Mori.