Bayes Rules! [book review]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 5, 2022 by xi'an

Bayes Rules! is a new introductory textbook on Applied Bayesian Model(l)ing, written by Alicia Johnson (Macalester College), Miles Ott (Johnson & Johnson), and Mine Dogucu (University of California Irvine). Textbook sent to me by CRC Press for review. It is available (free) online as a website and has a github site, as well as a bayesrule R package. (Which reminds me that both our own book R packages, bayess and mcsm, have gone obsolete on CRAN! And that I should find time to figure out the issue for an upgrading…)

As far as I can tell [from abroad and from only teaching students with a math background], Bayes Rules! seems to be catering to early (US) undergraduate students with very little exposure to mathematical statistics or probability, as it introduces basic probability notions like pmf, joint distribution, and Bayes’ theorem (as well as Greek letters!) and shies away from integration or algebra (a covariance matrix occurs on page 437 with a lot . For instance, the Normal-Normal conjugacy derivation is considered a “mouthful” (page 113). The exposition is somewhat stretched along the 500⁺ pages as a result, imho, which is presumably a feature shared with most textbooks at this level, and, accordingly, the exercises and quizzes are more about intuition and reproducing the contents of the chapter than technical. In fact, I did not spot there a mention of sufficiency, consistency, posterior concentration (almost made on page 113), improper priors, ergodicity, irreducibility, &tc., while other notions are not precisely defined, like ESS, weakly informative (page 234) or vague priors (page 77), prior information—which makes the negative answer to the quiz “All priors are informative”  (page 90) rather confusing—, R-hat, density plot, scaled likelihood, and more.

As an alternative to “technical derivations” Bayes Rules! centres on intuition and simulation (yay!) via its bayesrule R package. Itself relying on rstan. Learning from example (as R code is always provided), the book proceeds through conjugate priors, MCMC (Metropolis-Hasting) methods, regression models, and hierarchical regression models. Quite impressive given the limited prerequisites set by the authors. (I appreciated the representations of the prior-likelihood-posterior, especially in the sequential case.)

Regarding the “hot tip” (page 108) that the posterior mean always stands between the prior mean and the data mean, this should be made conditional on a conjugate setting and a mean parameterisation. Defining MCMC as a method that produces a sequence of realisations that are not from the target makes a point, except of course that there are settings where the realisations are from the target, for instance after a renewal event. Tuning MCMC should remain a partial mystery to readers after reading Chapter 7 as the Goldilocks principle is quite vague. Similarly, the derivation of the hyperparameters in a novel setting (not covered by the book) should prove a challenge, even though the readers are encouraged to “go forth and do some Bayes things” (page 509).

While Bayes factors are supported for some hypothesis testing (with no point null), model comparison follows more exploratory methods like X validation and expected log-predictive comparison.

The examples and exercises are diverse (if mostly US centric), modern (including cultural references that completely escape me), and often reflect on the authors’ societal concerns. In particular, their concern about a fair use of the inferred models is preminent, even though a quantitative assessment of the degree of fairness would require a much more advanced perspective than the book allows… (In that respect, Exercise 18.2 and the following ones are about book banning (in the US). Given the progressive tone of the book, and the recent ban of math textbooks in the US, I wonder if some conservative boards would consider banning it!) Concerning the Himalaya submitting running example (Chapters 18 & 19), where the probability to summit is conditional on the age of the climber and the use of additional oxygen, I am somewhat surprised that the altitude of the targeted peak is not included as a covariate. For instance, Ama Dablam (6848 m) is compared with Annapurna I (8091 m), which has the highest fatality-to-summit ratio (38%) of all. This should matter more than age: the Aosta guide Abele Blanc climbed Annapurna without oxygen at age 57! More to the point, the (practical) detailed examples do not bring unexpected conclusions, as for instance the fact that runners [thrice alas!] tend to slow down with age.

A geographical comment: Uluru (page 267) is not a city!, but an impressive sandstone monolith in the heart of Australia, a 5 hours drive away from Alice Springs. And historical mentions: Alan Turing (page 10) and the team at Bletchley Park indeed used Bayes factors (and sequential analysis) in cracking the Enigma, but this remained classified information for quite a while. Arianna Rosenbluth (page 10, but missing on page 165) was indeed a major contributor to Metropolis et al.  (1953, not cited), but would not qualify as a Bayesian statistician as the goal of their algorithm was a characterisation of the Boltzman (or Gibbs) distribution, not statistical inference. And David Blackwell’s (page 10) Basic Statistics is possibly the earliest instance of an introductory Bayesian and decision-theory textbook, but it never mentions Bayes or Bayesianism.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

out-standing scientist

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on November 12, 2021 by xi'an

I noticed quite recently that the [Nature] journal Heredity [managed by the Genetics Society] had published an historical / opinion piece on Ronald Fisher and his views on eugenics and race. The authors are all trustees of the Fisher Memorial Trust. The core of the paper contents was also contained in [one of the authors] Stephen Senn’s talk at the JSM round table (I also took part in) and later at the RSS. This is mostly an attempt at resetting Fisher’s position within the era when he lived, in terms of prevalent racism, nationalism, and imperialism. At the core of these woes was a generalised belief in the superiority of some nations, creeds, human groups, even social classes, over others, that was used as a justification in the tragedies of large scale colonialism, the first World War, systemic racism, Nazism, and widespread forced sterilisations….

More attention to the History of Science is needed, as much by scientists as by historians, and especially by biologists, and this should mean a deliberate attempt to understand the thoughts of the great masters of the past, to see in what circumstances or intellectual milieu their ideas were formed, where they took the wrong turning  track or stopped short of the right.”  R.A. Fisher (1959)

While I am thinking the authors are somewhat stretching the arguments isolating Ronald from the worst manifestations of eugenism and racism, as the concept of “voluntary sterilisation” is more than debatable when applied to patients with limited intellectual abilities, as Fisher considered (in 1943) that the Nazi racial laws “have been successful with the best type of German” (which stands as a fairly stupid statement on so many levels, starting with the one that this racial selection had only started a few years before!) and “that the Party sincerely wished to benefit the German racial stock” (in 1948), my already made point is rather that the general tendency of turning genii into saints is bound to meet with disappointment. (Hence, if we have to stick with them, named lectures, prizes, memorials, &tc., should come with an expiration date!)

an hypothetical chain of transmissions

Posted in Books, Statistics, University life with tags , , , , , , on August 6, 2021 by xi'an

baseless!

Posted in Books, Statistics with tags , , , , , , , , , , on July 13, 2021 by xi'an

Bayesian sufficiency

Posted in Books, Kids, Statistics with tags , , , , , , , , , on February 12, 2021 by xi'an

“During the past seven decades, an astonishingly large amount of effort and ingenuity has gone into the search fpr resonable answers to this question.” D. Basu

Induced by a vaguely related question on X validated, I re-read Basu’s 1977 great JASA paper on the elimination of nuisance parameters. Besides the limitations of competing definitions of conditional, partial, marginal sufficiency for the parameter of interest,  Basu discusses various notions of Bayesian (partial) sufficiency.

“After a long journey through a forest of confusing ideas and examples, we seem to have lost our way.” D. Basu

Starting with Kolmogorov’s idea (published during WW II) to impose to all marginal posteriors on the parameter of interest θ to only depend on a statistic S(x). But having to hold for all priors cancels the notion as the statistic need be sufficient jointly for θ and σ, as shown by Hájek in the early 1960’s. Following this attempt, Raiffa and Schlaifer then introduced a more restricted class of priors, namely where nuisance and interest are a priori independent. In which case a conditional factorisation theorem is a sufficient (!) condition for this Q-sufficiency.  But not necessary as shown by the N(θ·σ, 1) counter-example (when σ=±1 and θ>0). [When the prior on σ is uniform, the absolute average is Q-sufficient but is this a positive feature?] This choice of prior separation is somewhat perplexing in that it does not hold under reparameterisation.

Basu ends up with three challenges, including the multinomial M(θ·σ,½(1-θ)·(1+σ),½(1+θ)·(1-σ)), with (n¹,n²,n³) as a minimal sufficient statistic. And the joint observation of an Exponential Exp(θ) translated by σ and of an Exponential Exp(σ) translated by -θ, where the prior on σ gets eliminated in the marginal on θ.