## Monte Carlo Markov chains

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , on May 12, 2020 by xi'an

Darren Wraith pointed out this (currently free access) Springer book by Massimiliano Bonamente [whose family name means good spirit in Italian] to me for its use of the unusual Monte Carlo Markov chain rendering of MCMC.  (Google Trend seems to restrict its use to California!) This is a graduate text for physicists, but one could nonetheless expect more rigour in the processing of the topics. Particularly of the Bayesian topics. Here is a pot-pourri of memorable quotes:

“Two major avenues are available for the assignment of probabilities. One is based on the repetition of the experiments a large number of times under the same conditions, and goes under the name of the frequentist or classical method. The other is based on a more theoretical knowledge of the experiment, but without the experimental requirement, and is referred to as the Bayesian approach.”

“The Bayesian probability is assigned based on a quantitative understanding of the nature of the experiment, and in accord with the Kolmogorov axioms. It is sometimes referred to as empirical probability, in recognition of the fact that sometimes the probability of an event is assigned based upon a practical knowledge of the experiment, although without the classical requirement of repeating the experiment for a large number of times. This method is named after the Rev. Thomas Bayes, who pioneered the development of the theory of probability.”

“The likelihood P(B/A) represents the probability of making the measurement B given that the model A is a correct description of the experiment.”

“…a uniform distribution is normally the logical assumption in the absence of other information.”

“The Gaussian distribution can be considered as a special case of the binomial, when the number of tries is sufficiently large.”

“This clearly does not mean that the Poisson distribution has no variance—in that case, it would not be a random variable!”

“The method of moments therefore returns unbiased estimates for the mean and variance of every distribution in the case of a large number of measurements.”

“The great advantage of the Gibbs sampler is the fact that the acceptance is 100 %, since there is no rejection of candidates for the Markov chain, unlike the case of the Metropolis–Hastings algorithm.”

Let me then point out (or just whine about!) the book using “statistical independence” for plain independence, the use of / rather than Jeffreys’ | for conditioning (and sometimes forgetting \ in some LaTeX formulas), the confusion between events and random variables, esp. when computing the posterior distribution, between models and parameter values, the reliance on discrete probability for continuous settings, as in the Markov chain chapter, confusing density and probability, using Mendel’s pea data without mentioning the unlikely fit to the expected values (or, as put more subtly by Fisher (1936), “the data of most, if not all, of the experiments have been falsified so as to agree closely with Mendel’s expectations”), presenting Fisher’s and Anderson’s Iris data [a motive for rejection when George was JASA editor!] as a “a new classic experiment”, mentioning Pearson but not Lee for the data in the 1903 Biometrika paper “On the laws of inheritance in man” (and woman!), and not accounting for the discrete nature of this data in the linear regression chapter, the three page derivation of the Gaussian distribution from a Taylor expansion of the Binomial pmf obtained by differentiating in the integer argument, spending endless pages on deriving standard properties of classical distributions, this appalling mess of adding over the conditioning atoms with no normalisation in a Poisson experiment

$P(X=4|\mu=0,1,2) = \sum_{\mu=0}^2 \frac{\mu^4}{4!}\exp\{-\mu\}$,

botching the proof of the CLT, which is treated before the Law of Large Numbers, restricting maximum likelihood estimation to the Gaussian and Poisson cases and muddling its meaning by discussing unbiasedness, confusing a drifted Poisson random variable with a drift on its parameter, as well as using the pmf of the Poisson to define an area under the curve (Fig. 5.2), sweeping the improperty of a constant prior under the carpet, defining a null hypothesis as a range of values for a summary statistic, no mention of Bayesian perspectives in the hypothesis testing, model comparison, and regression chapters, having one-dimensional case chapters followed by two-dimensional case chapters, reducing model comparison to the use of the Kolmogorov-Smirnov test, processing bootstrap and jackknife in the Monte Carlo chapter without a mention of importance sampling, stating recurrence results without assuming irreducibility, motivating MCMC by the intractability of the evidence, resorting to the term link to designate the current value of a Markov chain, incorporating the need for a prior distribution in a terrible description of the Metropolis-Hastings algorithm, including a discrete proof for its stationarity, spending many pages on early 1990’s MCMC convergence tests rather than discussing the adaptive scaling of proposal distributions, the inclusion of numerical tables [in a 2017 book] and turning Bayes (1763) into Bayes and Price (1763), or Student (1908) into Gosset (1908).

[Usual disclaimer about potential self-plagiarism: this post or an edited version of it could possibly appear later in my Books Review section in CHANCE. Unlikely, though!]

## Computing Bayes: Bayesian Computation from 1763 to the 21st Century

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , on April 16, 2020 by xi'an

Last night, Gael Martin, David Frazier (from Monash U) and myself arXived a survey on the history of Bayesian computations. This project started when Gael presented a historical overview of Bayesian computation, then entitled ‘Computing Bayes: Bayesian Computation from 1763 to 2017!’, at ‘Bayes on the Beach’ (Queensland, November, 2017). She then decided to build a survey from the material she had gathered, with her usual dedication and stamina. Asking David and I to join forces and bring additional perspectives on this history. While this is a short and hence necessary incomplete history (of not everything!), it hopefully brings some different threads together in an original enough fashion (as I think there is little overlap with recent surveys I wrote). We welcome comments about aspects we missed, skipped or misrepresented, most obviously!

## Bayes plaque

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on November 22, 2019 by xi'an

## at the centre of Bayes

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on October 14, 2019 by xi'an

## let the evidence speak [book review]

Posted in Books, Kids, Statistics with tags , , , , , , , , , , on December 17, 2018 by xi'an

This book by Alan Jessop, professor at the Durham University Business School,  aims at presenting Bayesian ideas and methods towards decision making “without formula because they are not necessary; the ability to add and multiply is all that is needed.” The trick is in using a Bayes grid, in other words a two by two table. (There are a few formulas that survived the slaughter, see e.g. on p. 91 the formula for the entropy. Contained in the chapter on information that I find definitely unclear.) When leaving the 2×2 world, things become more complicated and the construction of a prior belief as a probability density gets heroic without the availability of maths formulas. The first part of the paper is about Likelihood, albeit not the likelihood function, despite having the general rule that (p.73)

belief is proportional to base rate x likelihood

which is the book‘s version of Bayes’ (base?!) theorem. It then goes on to discuss the less structure nature of prior (or prior beliefs) against likelihood by describing Tony O’Hagan’s way of scaling experts’ beliefs in terms of a Beta distribution. And mentioning Jaynes’ maximum entropy prior without a single formula. What is hard to fathom from the text is how can one derive the likelihood outside surveys. (Using the illustration of 1963 Oswald’s murder by Ruby in the likelihood chapter does not particularly help!) A bit of nitpicking at this stage: the sentence

“The ancient Greeks, and before them the Chinese and the Aztecs…”

is historically incorrect since, while the Chinese empire dates back before the Greek dark ages, the Aztecs only rule Mexico from the 14th century (AD) until the Spaniard invasion. While most of the book sticks with unidimensional parameters, it also discusses more complex structures, for which it relies on Monte Carlo, although the description is rather cryptic (use your spreadsheet!, p.133). The book at this stage turns into a more story-telling mode, by considering for instance the Federalist papers analysis by Mosteller and Wallace. The reader can only follow the process of assessing a document authorship for a single word, as multidimensional cases (for either data or parameters) are out of reach. The same comment applies to the ecology, archeology, and psychology chapters that follow. The intermediary chapter on the “grossly misleading” [Court wording] of the statistical evidence in the Sally Clark prosecution is more accessible in that (again) it relies on a single number. Returning to the ban of Bayes rule in British courts:

In the light of the strong criticism by this court in the 1990s of using Bayes theorem before the jury in cases where there was no reliable statistical evidence, the practice of using a Bayesian approach and likelihood ratios to formulate opinions placed before a jury without that process being disclosed and debated in court is contrary to principles of open justice.

the discussion found in the book is quite moderate and inclusive, in that a Bayesian analysis helps in gathering evidence about a case, but may be misunderstood or misused at the [non-Bayesian] decision level.

In conclusion, Let the Evidence Speak is an interesting introduction to Bayesian thinking, through a simplifying device, the Bayes grid, which seems to come from management, with a large number of examples, if not necessarily all realistic and some side-stories. I doubt this exposure can produce expert practitioners, but it makes for an worthwhile awakening for someone “likely to have read this book because [one] had heard of Bayes but were uncertain what is was” (p.222). With commendable caution and warnings along the way.