Archive for blogging

probably overthinking it [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on December 13, 2023 by xi'an

Probably overthinking it, written by Allen B. Downey (who wrote a series of books starting with Think, like Think Python, Think Bayes, Think Stats), belongs to this numerous collection of introductory books that aim at making statistics more palatable and enticing to the general public by making the fundamental concepts more intuitive and building upon real life examples. I would thus stop short of calling it “essential guide” as in the first flap of the dust jacket, since there exist many published books with a similar goal, some of which were actually reviews here. Now, there are ideas and examples therein I could borrow for my introductory stats course, except that I will cease teaching it next year! For instance, there are lots of examples related to COVID, which is great to engage (enrage?) the readers.

The book is quite pleasant to read, does not shy from mathematical formulae, and covers notions such as probability distributions, the Simpson, the Preston, the inspection, the Berkson paradoxes, and even some words on causality, sometimes at excessive lengths. (I have always been an adept of the concise church when it comes to textbook examples and fear that the multiplication of illustrations of a given concept may prove counterproductive.) The early chapters are heavily focussed on the Gaussian (or Normal) distribution. Making it appear as essential for conducting statistical analysis. When it does not, as in the ELO example, the explanations of a correction are less convincing.

I appreciated the book approach to model fit via the comparison of empirical cdfs with hypothetical ones. Also of primary interest is the systematic recourse to simulation, aka generative models, albeit without a systematic proper description. In the chapter (Chap 5) about durations, I think there are missed opportunities like the distributions of extremes (p 82) or the forgetfulness property of the Exponential distribution. Instead the focus is slightly diverging towards non-statistical issues on demography by the end of the chapter, with a potential for confusion between the Gomperz law and the Gomperz distribution. The Berkson paradox (Chap 6) is well-explained in terms of non-random populations (and reminded me when, years ago, when we tried to predict the first year success probability of undergrad applicants from their high school maths grade, the regression coefficient estimate ended up negative). Distributions of extremes do appear in Chap 8, if again seeking an ideal generic distribution seems to me rather misguided and misguiding. I would also argue that the author is missing the point of Taleb’s black swans by arguing in favour of a better modelling, when the later argues against the very predictability of extreme events in a non-stationary financial world… The chapter on fairness and fallacy (Chap 9) is actually about false positive/negative rates in different populations hence the ensuing unfairness (or the base fallacy). In that chapter there is no mention of Bayes (reserved for Think Bayes?!), but it is hitting hard enough at anti-vaxers (who will most likely not read the book). And does it again in the Simpson paradox chapter (Chap 10), whose proliferation is further stressed the following chapter on people becoming less racist or sexist or homophobic when they age, despite the proportion of racist/sexist/homophobic responses to a specific survey (GSS/Pew) increasing with age. This is prolonged into the rather minor final chapter.

Now that I have read the book, during a balmy afternoon in St Kilda (after an early start in the train to De Gaulle airport in freezing temperatures), I am a bit uncertain at what to make of it in terms of impact on the general public. For sure, the stories that accumulate chapter after chapter are nice and well argued, while introducing useful statistical concepts, but I do not see readers equipped enough to handle daily statistics with more than an healthy dose of scepticism, which obviously is a first step in the right direction!

Some nitpicking : the book is missing the historical connection to Quetelet’s “average man” when referring to the notion. And a potential explanation for the (approximate) log-Gaussianity of weights of individuals in a population through the fact that it is a volume, hence a third power of a sort.  Although birth weights are roughly Normal which kill my argument. I remain puzzled by the title, possibly missing a cultural reference (as there are tee-shirts sold with this sentence). It is the same as the name of a blog run by the author since 2011 and a fodder for the book. And the cover is terrible, breaking the words to fit the width making no sense, if I am not overthinking it! As often the book is rather US centric, although making no mention of US having much higher infant death rates than countries with similar GDPs when this data is discussed.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

so long, and thanks for all the quests

Posted in Books, Kids, R with tags , , , , , , , , , on October 25, 2023 by xi'an

The Riddler, which I have followed for many years, has been discontinued by FiveThirtyEight, but its producer, Zach Wissner-Gross, has launched a personal website to keep considering a weekly mathematical puzzle. The Fiddler on the Proof! Expect thus more ‘Og entries in this category!

short trip to a theocracy

Posted in pictures, Travel with tags , , , , , , , , , , on June 30, 2023 by xi'an

Besides attending a workshop on my research themes and meeting researchers from that field, a supplementary incentive for attending Stochastic Numerics and Statistical Learning was to catch a glimpse of Saudi Arabia and its idiosyncrasies.  While this is far from my first time visiting a Muslim theocracy, remembering a most enlightening (if penniless) backpacking trip to Morocco in the summer of 1982, Saudi Arabia is in a class of its own, as I experienced from filling a visa application where I had to enter my religion (non-Muslim was enough, though, while atheist would not have worked!), to flying on Saudia Airlines, with the broadcasted travelling Du’a before take-off, plus an extra religious announcement when flying over Mecca, to (early) Hadj pilgrims disembarking from planes wearing only the ihram, to the absolute prohibition of alcohol, even on the KAUST campus. Even though there are signs of recent liberalisation, in the sense of less restrictions imposed to women, probably helped by the timid opening to (non-religious) tourism, if not at all of democratisation, as show by the repression of dissenters, cf. Amnesty annual report, or the current plight of Raif Badawi.

Obviously, these are just quick impressions based on being no more than a few hours in the (real) country, between the airport visits and the taxi trip back from KAUST, since the city campus has had its own regulations from its creation ex nihilo (or de deserto) in 2009.  Accommodating more foreign students and faculty than local ones and running quite efficiently, thanks to the enormous budget provided by the Saudi monarchy, which also proves a Damocles sword, were the kingdom choose to allocate this money differently, to one of the many Hercules projects pushed by Mohammed bin Salman, also chairman of the university.

prior sensitivity of the marginal likelihood

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on June 27, 2022 by xi'an

Fernando Llorente and (Madrilene) coauthors have just arXived a paper on the safe use of prior densities for Bayesian model selection. Rather than blaming the Bayes factor, or excommunicating some improper priors, they consider in this survey solutions to design “objective” priors in model selection. (Writing this post made me realised I had forgotten to arXive a recent piece I wrote on the topic, based on short courses and blog pieces, for an incoming handbook on Bayesian advance(ment)s! Soon to be corrected.)

While intrinsically interested in the topic and hence with the study, I somewhat disagree with the perspective adopted by the authors. They for instance stick to the notion that a flat prior over the parameter space is appropriate as “the maximal expression of a non-informative prior” (despite depending on the parameterisation). Over bounded sets at least, while advocating priors “with great scale parameter” otherwise. They also refer to Jeffreys (1939) priors, by which they mean estimation priors rather than testing priors. As uncovered by Susie Bayarri and Gonzalo Garcia-Donato. Considering asymptotic consistency, they state that “in the asymptotic regime, Bayesian model selection is more sensitive to the sample size D than to the prior specifications”, which I find both imprecise and confusing,  as my feeling is that the prior specification remains overly influential as the sample size increases. (In my view, consistency is a minimalist requirement, rather than “comforting”.) The argument therein that a flat prior is informative for model choice stems from the fact that the marginal likelihood goes to zero as the support of the prior goes to infinity, which may have been an earlier argument of Jeffreys’ (1939), but does not carry much weight as the property is shared by many other priors (as remarked later). Somehow, the penalisation aspect of the marginal is not exploited more deeply in the paper. In the “objective” Bayes section, they adhere to the (convenient but weakly supported) choice of a common prior on the nuisance parameters (shared by different models). Their main argument is to develop (heretic!) “data-based priors”, from Aitkin (1991, not cited) double use of the data (or setting the likelihood to the power two), all the way to the intrinsic and fractional Bayes factors of Tony O’Hagan (1995), Jim Berger and Luis Pericchi (1996), and to the expected posterior priors of Pérez and Berger (2002) on which I worked with Juan Cano and Diego Salmeròn. (While the presentation is made against a flat prior, nothing prevents the use of another reference, improper, prior.) A short section also mentions the X-validation approach(es) of Aki Vehtari and co-authors.

[de]quarantined by slideshare

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on January 11, 2021 by xi'an

A follow-up episode to the SlideShare m’a tuer [sic] saga: After the 20 November closure of my xianblog account and my request for an explanation, I was told by Linkedin that a complaint has been made about one of my talks for violation of copyright. Most surprisingly, at least at first, it was about the slides for the graduate lectures I gave ten years ago at CREST on (re)reading Jaynes’ Probability Theory. While the slides contain a lot of short quotes from the Logic of Science, somewhat necessarily since I discuss the said book, there are also many quotes from Jeffreys’ Theory of Probability and “t’is but a scratch” on the contents of this lengthy book… Plus, the pdf file appears to be accessible on several sites, including one with an INRIA domain. Since I had to fill a “Counter-Notice of Copyright Infringement” to unlock the rest of the depository, I just hope no legal action is going to be taken about this lecture. But I remain puzzled at the reasoning behind the complaint, unwilling to blame radical Jaynesians for it! As an aside, here are the registered 736 views of the slides for the past year: