Statistical modeling and computation [book review]
Dirk Kroese (from UQ, Brisbane) and Joshua Chan (from ANU, Canberra) just published a book entitled Statistical Modeling and Computation, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, Statistical Modeling and Computation is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our Bayesian Essentials with R, meaning that I am not the most appropriate or least
unbiased reviewer for this book.
Indeed, along the years, I have become uncomfortable with the notions (a) that a statistics textbook should start with an introduction to probability theory and (b) that it should cover different paradigms in a neutral way. I already posted about this last point in a partial review of two CRC Press books. It indeed sounds paradoxical (to me) to present those different paradigms (including the Bayesian approach) to students and to let them deal with the dilemma of picking one versus the other, since they will lack the maturity and perspective to do so. I am also afraid the duality maximum likelihood vs. Bayesian analysis will induce students to pick the former as simpler, since the motivations for conducting a Bayesian analysis will be lost on them. For instance, in the advanced models part, the maximum likelihood estimation is always processed, while the Bayesian alternative is relegated to the problems section. (As to the first point, this morning I had an R exam for the third year students when some were unable to provide a formal definition of the median of a distribution, despite their three or four years of training in probability theory!)
“In the Bayesian approach the parameter vector is considered to be random.” (p.121)
One thing I also criticised in earlier posts is the above presentation of Bayesian analysis. I object to this phrasing because it confuses the tool of using a probability distribution to summarise our knowledge/feelings/a prioris about a parameter that is fixed (and unknown) for the data at hand (in the idealised setting of a true model). Random parameters are those found in random effect models (and even then, for the data at hand, there is one realisation of those parameters!) The book also introduces the notion of a Bayesian likelihood function (p.228), which “differs slightly from that in classical statistics”. The only difference I can spot is in the interpretation: both functions of (θ,x) are numerically the same. Overall, the chapter on Bayesian inference does not spend much time on prior specification. There is a section on conjugate priors that does not mention the issue of picking the hyperparameters. While improper priors are introduced as limits of proper priors and as conveying “the least amount of information about [the parameters]” (p.236), but the difficulty in using improper priors for hypothesis testing is not mentioned. Both Chib’s method and the Savage-Dickey density ratio are suggested for the approximation of marginal likelihoods.
“It may be too time-consuming, or simply not feasible, to obtain such replications. An alternative is to resample the original data.” (p.205)
Somewhat in agreement with my own R course, the Monte Carlo chapter starts with the empirical cdf and the bootstrap. I found the explanation of the empirical cdf (p.198) confusing in that the authors distinguish between a deterministic version and a stochastic version of the empirical cdf: in a statistical setting, there is no deterministic version! The chapter naturally includes a mention of the Kolmogorov Smirnov test, as well as a short section on density estimation, which recommends using the “theta KDE” method of Botev, Grotowski and Kroese (2010, AoS), without defining it though. The bootstrap is introduced through the empirical cdf, except for the above quote that I find puzzling: observing a sample from an unknown cdf prohibits replicating iid simulations from this cdf. As in my course, bootstrapping linear regression is given as an example. However, the authors suggest resampling the pairs (xi,yi), while I tell my students to resample from the estimated residuals, since those are the iid variables. The MCMC section introduces Metropolis-Hastings and Gibbs sampling algorithms (that are used in the later chapters), but fails to warn about the calibration for the former.
In conclusion, and despite the rather gruff tone of the above, this book is a fairly decent quick introduction to the practice of statistical analysis which manages to reach the complexity of models like stochastic volatility models in a few hundred pages. It could thus be helpful when teaching an undergraduate statistics class for non-specialists.