## Probability and Bayesian modeling [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 26, 2020 by xi'an

Probability and Bayesian modeling is a textbook by Jim Albert [whose reply is included at the end of this entry] and Jingchen Hu that CRC Press sent me for review in CHANCE. (The book is also freely available in bookdown format.) The level of the textbook is definitely most introductory as it dedicates its first half on probability concepts (with no measure theory involved), meaning mostly focusing on counting and finite sample space models. The second half moves to Bayesian inference(s) with a strong reliance on JAGS for the processing of more realistic models. And R vignettes for the simplest cases (where I discovered R commands I ignored, like dplyr::mutate()!).

As a preliminary warning about my biases, I am always reserved at mixing introductions to probability theory and to (Bayesian) statistics in the same book, as I feel they should be separated to avoid confusion. As for instance between histograms and densities, or between (theoretical) expectation and (empirical) mean. I therefore fail to relate to the pace and tone adopted in the book which, in my opinion, seems to dally on overly simple examples [far too often concerned with food or baseball] while skipping over the concepts and background theory. For instance, introducing the concept of subjective probability as early as page 6 is laudable but I doubt it will engage fresh readers when describing it as a measurement of one’s “belief about the truth of an event”, then stressing that “make any kind of measurement, one needs a tool like a scale or ruler”. Overall, I have no particularly focused criticisms on the probability part except for the discrete vs continuous imbalance. (With the Poisson distribution not covered in the Discrete Distributions chapter. And the “bell curve” making a weird and unrigorous appearance there.) Galton’s board (no mention found of quincunx) could have been better exploited towards the physical definition of a prior, following Steve Stiegler’s analysis, by adding a second level. Or turned into an R coding exercise. In the continuous distributions chapter, I would have seen the cdf coming first to the pdf, rather than the opposite. And disliked the notion that a Normal distribution was supported by an histogram of (marathon) running times, i.e. values lower bounded by 122 (at the moment). Or later (in Chapter 8) for Roger Federer’s serving times. Incidentally, a fun typo on p.191, at least fun for LaTeX users, as

$f_{Y\ mid X}$

with an extra space between \’ and mid’! (I also noticed several occurrences of the unvoidable “the the” typo in the last chapters.) The simulation from a bivariate Normal distribution hidden behind a customised R function sim_binom() when it could have been easily described as a two-stage hierarchy. And no comment on the fact that a sample from Y-1.5X could be directly derived from the joint sample. (Too unconscious a statistician?)

When moving to Bayesian inference, a large section is spent on very simple models like estimating a proportion or a mean, covering both discrete and continuous priors. And strongly focusing on conjugate priors despite giving warnings that they do not necessarily reflect prior information or prior belief. With some debatable recommendation for “large” prior variances as weakly informative or (worse) for Exp(1) as a reference prior for sample precision in the linear model (p.415). But also covering Bayesian model checking either via prior predictive (hence Bayes factors) or posterior predictive (with no mention of using the data twice). A very marginalia in introducing a sufficient statistic for the Normal model. In the Normal model checking section, an estimate of the posterior density of the mean is used without (apparent) explanation.

“It is interesting to note the strong negative correlation in these parameters. If one assigned informative independent priors on and , these prior beliefs would be counter to the correlation between the two parameters observed in the data.”

For the same reasons of having to cut on mathematical validation and rigour, Chapter 9 on MCMC is not explaining why MCMC algorithms are converging outside of the finite state space case. The proposal in the algorithmic representation is chosen as a Uniform one, since larger dimension problems are handled by either Gibbs or JAGS. The recommendations about running MCMC do not include how many iterations one “should” run (or other common queries on Stack eXchange), albeit they do include the sensible running multiple chains and comparing simulated predictive samples with the actual data as a  model check. However, the MCMC chapter very quickly and inevitably turns into commented JAGS code. Which I presume would require more from the students than just reading the available code. Like JAGS manual. Chapter 10 is mostly a series of examples of Bayesian hierarchical modeling, with illustrations of the shrinkage effect like the one on the book cover. Chapter 11 covers simple linear regression with some mentions of weakly informative priors,  although in a BUGS spirit of using large [enough?!] variances: “If one has little information about the location of a regression parameter, then the choice of the prior guess is not that important and one chooses a large value for the prior standard deviation . So the regression intercept and slope are each assigned a Normal prior with a mean of 0 and standard deviation equal to the large value of 100.” (p.415). Regardless of the scale of y? Standardisation is covered later in the chapter (with the use of the R function scale()) as part of constructing more informative priors, although this sounds more like data-dependent priors to me in the sense that the scale and location are summarily estimated by empirical means from the data. The above quote also strikes me as potentially confusing to the students, as it does not spell at all how to design a joint distribution on the linear regression coefficients that translate the concentration of these coefficients along y̅=β⁰+β¹x̄. Chapter 12 expands the setting to multiple regression and generalised linear models, mostly consisting of examples. It however suggests using cross-validation for model checking and then advocates DIC (deviance information criterion) as “to approximate a model’s out-of-sample predictive performance” (p.463). If only because it is covered in JAGS, the definition of the criterion being relegated to the last page of the book. Chapter 13 concludes with two case studies, the (often used) Federalist Papers analysis and a baseball career hierarchical model. Which may sound far-reaching considering the modest prerequisites the book started with.

In conclusion of this rambling [lazy Sunday] review, this is not a textbook I would have the opportunity to use in Paris-Dauphine but I can easily conceive its adoption for students with limited maths exposure. As such it offers a decent entry to the use of Bayesian modelling, supported by a specific software (JAGS), and rightly stresses the call to model checking and comparison with pseudo-observations. Provided the course is reinforced with a fair amount of computer labs and projects, the book can indeed achieve to properly introduce students to Bayesian thinking. Hopefully leading them to seek more advanced courses on the topic.

Update: Jim Albert sent me the following precisions after this review got on-line:

Thanks for your review of our recent book.  We had a particular audience in mind, specifically undergraduate American students with some calculus background who are taking their first course in probability and statistics.  The traditional approach (which I took many years ago) teaches some probability one semester and then traditional inference (focusing on unbiasedness, sampling distributions, tests and confidence intervals) in the second semester.  There didn’t appear to be any Bayesian books at that calculus-based undergraduate level and that motivated the writing of this book.  Anyway, I think your comments were certainly fair and we’ve already made some additions to our errata list based on your comments.
[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE. As appropriate for a book about Chance!]

## chance call for book reviewers

Posted in Statistics with tags , , , , , , , , on May 14, 2019 by xi'an

Since I have been unable to find local reviewers for my CHANCE review column of the above recent CRC Press books, namely

## Statistics and Health Care Fraud & Measuring Crime [ASA book reviews]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , on May 7, 2019 by xi'an

From the recently started ASA books series on statistical reasoning in science and society (of which I already reviewed a sequel to The Lady tasting Tea), a short book, Statistics and Health Care Fraud, I read at the doctor while waiting for my appointment, with no chances of cheating! While making me realise that there is a significant amount of health care fraud in the US, of which I had never though of before (!), with possibly specific statistical features to the problem, besides the use of extreme value theory, I did not find me insight there on the techniques used to detect these frauds, besides the accumulation of Florida and Texas examples. As  such this is a very light introduction to the topic, whose intended audience of choice remains unclear to me. It is stopping short of making a case for statistics and modelling against more machine-learning options. And does not seem to mention false positives… That is, the inevitable occurrence of some doctors or hospitals being above the median costs! (A point I remember David Spiegelhalter making a long while ago, during a memorable French statistical meeting in Pau.) The book also illustrates the use of a free auditing software called Rat-stats for multistage sampling, which apparently does not go beyond selecting claims at random according to their amount. Without learning from past data. (I also wonder if the criminals can reduce the chances of being caught by using this software.)

A second book on the “same” topic!, Measuring Crime, I read, not waiting at the police station, but while flying to Venezia. As indicated by the title, this is about measuring crime, with a lot of emphasis on surveys and census and the potential measurement errors at different levels of surveying or censusing… Again very little on statistical methodology, apart from questioning the data, the mode of surveying, crossing different sources, and establishing the impact of the way questions are stated, but also little on bias and the impact of policing and preventing AIs, as discussed in Weapons of Math Destruction and in some of Kristin Lum’s papers.Except for the almost obligatory reference to Minority Report. The book also concludes on an history chapter centred at Edith Abbott setting the bases for serious crime data collection in the 1920’s.

[And the usual disclaimer applies, namely that this bicephalic review is likely to appear later in CHANCE, in my book reviews column.]

## estimation exam [best of]

Posted in Books, Kids, Statistics with tags , , , , , , , , on January 29, 2019 by xi'an

Yesterday, I received a few copies of our CRC Press Handbook of Mixture Analysis, while grading my mathematical statistics exam 160 copies. Among the few goodies, I noticed the always popular magical equality

E[1/T]=1/E[T]

that must have been used in so many homeworks and exam handouts by now that it should become a folk theorem. More innovative is the argument that E[1/min{X¹,X²,…}] does not exist for iid U(0,θ) because it is the minimum and thus is the only one among the order statistics with the ability to touch zero. Another universal shortcut was the completeness conclusion that when the integral

$\int_0^\theta \varphi(x) x^k \text{d}x$

was zero for all θ’s then φ had to be equal to zero with no further argument (only one student thought to take the derivative). Plus a growing inability in the cohort to differentiate even simple functions… (At least, most students got the bootstrap right, as exemplified by their R code.) And three stars to the student who thought of completely gluing his anonymisation tag, on every one of his five sheets!, making identification indeed impossible, except by elimination of the 159 other names.

## a book and two chapters on mixtures

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 8, 2019 by xi'an

The Handbook of Mixture Analysis is now out! After a few years of planning, contacts, meetings, discussions about notations, interactions with authors, further interactions with late authors, repeating editing towards homogenisation, and a final professional edit last summer, this collection of nineteen chapters involved thirty-five contributors. I am grateful to all participants to this piece of work, especially to Sylvia Früwirth-Schnatter for being a driving force in the project and for achieving a much higher degree of homogeneity in the book than I expected. I would also like to thank Rob Calver and Lara Spieker of CRC Press for their boundless patience through the many missed deadlines and their overall support.

Two chapters which I co-authored are now available as arXived documents:

along other chapters

## surprises in probability [book review]

Posted in Books, Statistics, Travel with tags , , , , , , , , , on November 20, 2018 by xi'an

A very short book (128 pages, but with a very high price!) I received from CRC Press is Henk Tijms’ Surprises in Probability (Seventeen Short Stories). Henk Tijms is an emeritus professor of econometrics at the Vrije University in Amsterdam and he wrote these seventeen pieces either for the Dutch Statistical Society magazine or for a blog he ran for the NYt. (The video of A Night in Casablanca above is only connected to this blog through Chico mimicking the word surprise as soup+rice.)

The author mentions that the book can be useful for teachers and indeed this is a collection of surprising probability results, surprising in the sense that the numerical probabilities are not necessarily intuitive. Most illustrations involve betting of one sort or another,  with only basic (combinatorial) probability distributions involved. Readers should not worry about even this basic probability background since most statements are exposed without a proof. Most examples are very classical, from the prisoner’s problem, to the Monty Hall paradox, to the birthday problem, to Benford’s distribution of digits, to gambler’s ruin, gambler’s fallacy, and the St Petersbourg paradox, to the secretary’s problem and stopping rules. The most advanced notion is the one of (finite state) Markov chains. As martingales are only mentionned in connection with pseudo-probabilist schemes for winning the lottery. For which (our very own!) Jeff Rosenthal makes an appearance, thanks to his uncovering of the Ontario Lottery scam!

“In no other branch of mathematics is it so easy for experts to blunder as in probability theory.”  Martin Gardner

A few stories have entries about Bayesian statistics, with mentions made of the O.J. Simpson, Sally Clark and Lucia de Berk miscarriages of justice, although these mentions make the connection most tenuous. Simulation is also mentioned as a manner of achieving approximations to more complex probabilities. But not to the point of discussing surprises about simulation, which could have been the case with the simulation of rare events.

Ten most beautiful probability formulas (Story 10) reminded me of Ian Steward 17 formulas that changed the World. Obviously at another scale and in a much less convincing way. To wit, the Normal (or Gauss) density, Bayes’ formula, the gambler’s ruin formula, the squared-root formula (meaning standard deviation decreases as √n), Kelly’s betting formula (?), the asymptotic law of distribution of prime numbers (??), another squared-root formula for the one-dimensional random walk, the newsboy formula (?), the Pollaczek-Khintchine formula (?), and the waiting-time formula. I am not sure I would have included any of these…

All in all this is a nice if unsurprising database for illustrations and possibly exercises in elementary probability courses, although it will require some work from the instructor to link the statements to their proof. As one would expect from blog entries. But this makes for a nice reading, especially while traveling and I hope some fellow traveler will pick the book from where I left it in Mexico City airport.

## ABC in print

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on September 5, 2018 by xi'an

The CRC Press Handbook of ABC is now out, after a rather long delay [the first version of our model choice chapter was written in 2015!] due to some late contributors Which is why I did not spot it at JSM 2018. As announced a few weeks ago, our Handbook of Mixture Analysis is soon to be published as well. (Not that I necessarily advocate the individual purchase of these costly volumes!, especially given most chapters are available on-line.)