dice and sticks

Posted in Books, Kids, R with tags , , , , , , on November 19, 2021 by xi'an

A quick weekend riddle from the Riddler about the probability of getting a sequence of increasing numbers from dice with an increasing number of faces, eg 4-, 6-, and 8-face dice. Which happens to be 1/4. By sheer calculation (à la Gauss) or simple enumération (à la R):

> for(i in 1:4)for(j in (i+1):6)F=F+(8-j)
> F/4/6/8
[1] 0.25


The less-express riddle is an optimisation problem related with stick breaking: given a stick of length one, propose a fraction a and win (1-a) if a Uniform x is less than one. Since the gain is a(1-a) the maximal average gain is associated with a=½. Now, if the remaining stick (1-a) can be divided when x>a, what is the sequence of fractions one should use when the gain is the length of the remaining stick? With two attempts only, the optimal gain is still ¼. And a simulation experiment with three attempts again returns ¼.

The [errors in the] error of truth [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on August 10, 2021 by xi'an

OUP sent me this book, The error of truth by Steven Osterling, for review. It is a story about the “astonishing” development of quantitative thinking in the past two centuries. Unfortunately, I found it to be one of the worst books I have read on the history of sciences…

To start with the rather obvious part, I find the scholarship behind the book quite shoddy as the author continuously brings in items of historical tidbits to support his overall narrative and sometimes fills gaps on his own. It often feels like the material comes from Wikipedia, despite expressing a critical view of the on-line encyclopedia. The [long] quote below is presumably the most shocking historical blunder, as the terror era marks the climax of the French Revolution, rather than the last fight of the French monarchy. Robespierre was the head of the Jacobins, the most radical revolutionaries at the time, and one of the Assembly members who voted for the execution of Louis XIV, which took place before the Terror. And later started to eliminate his political opponents, until he found himself on the guillotine!

“The monarchy fought back with almost unimaginable savagery. They ordered French troops to carry out a bloody campaign in which many thousands of protesters were killed. Any peasant even remotely suspected of not supporting the government was brutally killed by the soldiers; many were shot at point-blank range. The crackdown’s most intense period was a horrific ten-month Reign of Terror (“la Terreur”) during which the government guillotined untold masses (some estimates are as high as 5,000) of its own citizens as a means to control them. One of the architects of the Reign of Terror was Maximilien Robespierre, a French nobleman and lifelong politician. He explained the government’s slaughter in unbelievable terms, as “justified terror . . . [and] an emanation of virtue” (quoted in Linton 2006). Slowly, however, over the next few years, the people gained control. In the end, many nobles, including King Louis XVI and his wife Marie-Antoinette, were themselves executed by guillotining”

Obviously, this absolute misinterpretation does not matter (very) much for the (hi)story of quantification (and uncertainty assessment), but it demonstrates a lack of expertise of the author. And sap whatever trust one could have in new details he brings to light (life?). As for instance when stating

“Bayes did a lot of his developmental work while tutoring students in local pubs. He was a respected teacher. Taking advantage of his immediate resources (in his circumstance, a billiard table), he taught his theorem to many.”

which does not sound very plausible. I never heard that Bayes had students  or went to pubs or exposed his result to many before its posthumous publication… Or when Voltaire (who died in 1778) is considered as seventeenth-century precursor of the Enlightenment. Or when John Graunt, true member of the Royal Society, is given as a member of the Académie des Sciences. Or when Quetelet is presented as French and as a student of Laplace.

The maths explanations are also puzzling, from the law of large numbers illustrated by six observations, and wrongly expressed (p.54) as

$\bar{X}_n+\mu\qquad\text{when}\qquad n\longrightarrow\infty$

to  the Saint-Petersbourg paradox being seen as inverse probability, to a botched description of the central limit theorem  (p.59), including the meaningless equation (p.60)

$\gamma_n=\frac{2^{2n}}{\pi}\int_0^\pi~\cos^{2n} t\,\text dt$

to de Moivre‘s theorem being given as Taylor’s expansion

$f(z)=\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(z-a)^2$

and as his derivation of the concept of variance, to another botched depiction of the difference between Bayesian and frequentist statistics, incl. the usual horror

$P(68.5<70<71.5)=95%$

to independence being presented as a non-linear relation (p.111), to the conspicuous absence of Pythagoras in the regression chapter, to attributing to Gauss the concept of a probability density (when Simpson, Bayes, Laplace used it as well), to another highly confusing verbal explanation of densities, including a potential confusion between different representations of a distribution (Fig. 9.6) and the existence of distributions other than the Gaussian distribution, to another error in writing the Gaussian pdf (p.157),

$f(x)=\dfrac{e^{-(z-\mu)^2}\big/2\sigma^2}{\sigma\sqrt{2\pi}}$

to yet another error in the item response probability (p.301), and.. to completely missing the distinction between the map and the territory, i.e., the probabilistic model and the real world (“Truth”), which may be the most important shortcoming of the book.

The style is somewhat heavy, with many repetitions about the greatness of the characters involved in the story, and some degree of license in bringing them within the narrative of the book. The historical determinism of this narrative is indeed strong, with a tendency to link characters more than they were, and to make them greater than life. Which is a usual drawback of such books, along with the profuse apologies for presenting a few mathematical formulas!

The overall presentation further has a Victorian and conservative flavour in its adoration of great names, an almost exclusive centering on Western Europe, a patriarchal tone (“It was common for them to assist their husbands in some way or another”, p.44; Marie Curie “agreed to the marriage, believing it would help her keep her laboratory position”, p.283), a defense of the empowerment allowed by the Industrial Revolution and of the positive sides of colonialism and of the Western expansion of the USA, including the invention of Coca Cola as a landmark in the march to Progress!, to the fall of the (communist) Eastern Block being attributed to Ronald Reagan, Karol Wojtyła, and Margaret Thatcher, to the Bell Curve being written by respected professors with solid scholarship, if controversial, to missing the Ottoman Enlightenment and being particularly disparaging about the Middle East, to dismissing Galton’s eugenism as a later year misguided enthusiasm (and side-stepping the issue of Pearson’s and Fisher’s eugenic views),

Another recurrent if minor problem is the poor recording of dates and years when introducing an event or a new character. And the quotes referring to the current edition or translation instead of the original year as, e.g., Bernoulli (1954). Or even better!, Bayes and Price (1963).

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

O’Bayes 19/4

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 4, 2019 by xi'an

Last talks of the conference! With Rui Paulo (along with Gonzalo Garcia-Donato) considering the special case of factors when doing variable selection. Which is an interesting question that I had never considered, as at best I would remove all leves or keeping them all. Except that there may be misspecification in the factors as for instance when several levels have the same impact.With Michael Evans discussing a paper that he wrote for the conference! Following his own approach to statistical evidence. And including his reluctance to cover infinity (calling on Gauß for backup!) or continuity, and his call to falsify a Bayesian model by checking it can be contradicted by the data. His assumption that checking for prior is separable from checking for [sampling] model is debatable. (With another mention made of the Savage-Dickey ratio.)

And with Dimitris Fouskakis giving a wide ranging assessment [which Mark Steel (Warwick) called a PEP talk!] of power-expected-posterior priors, used with reference (and usually improper) priors. Which in retrospect would have suited better the beginning of the conference as it provided a background to several of the talks. Raising a question (from my perspective) on using the maximum likelihood estimator as a pseudo-sufficient statistic when this MLE is computed for the base (simplest) model. Maybe an ABC induced bias in this question as it would not work for ABC model choice.

Overall, I think the scientific outcomes of the conference were quite positive: a wide range of topics and perspectives, a reasonable and diverse attendance, especially when considering the heavy load of related conferences in the surrounding weeks (the “June fatigue”!), animated poster sessions. I am obviously not the one to assess the organisation of the conference! Things I forgot to do in this regard: organise transportation from Oxford to Warwick University, provide an attached room for in-pair research, insist on sustainability despite the imposed catering solution, facilitate sharing joint transportation to and from the Warwick campus, mention that tap water was potable, and… wear long pants when running in nettles.

Max Ent at Max Plank

Posted in Statistics with tags , , , , , , , , , , on December 21, 2018 by xi'an

Gaussian hare and Laplacian tortoise

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , , on October 19, 2018 by xi'an

A question on X validated on the comparative merits of L¹ versus L² estimation led me to the paper of Stephen Portnoy and Roger Koenker entitled “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error versus Absolute-Error Estimators”, which I had missed at the time, despite enjoying a subscription to Statistical Science till the late 90’s.. The authors went as far as producing a parody of Granville’s Fables de La Fontaine by sticking Laplace’s and Gauss’ heads on the tortoise and the hare!

I remember rather vividly going through Steve Stigler’s account of the opposition between Laplace’s and Legendre’s approaches, when reading his History of Statistics in 1990 or 1991… Laplace defending the absolute error on the basis of the default double-exponential (or Laplace) distribution, when Legendre and then Gauss argued in favour of the squared error loss on the basis of a defaul Normal (or Gaussian) distribution. (Edgeworth later returned to the support of the L¹ criterion.) Portnoy and Koenker focus mostly on ways of accelerating the derivation of the L¹ regression estimators. (I also learned from the paper that Koenker was one of the originators of quantile regression.)