Archive for CRC Press

an elegant book [review]

Posted in Books, Statistics, University life with tags , , , , , , , on December 28, 2020 by xi'an

“Handbook of Mixture Analysis is an elegant book on the mixture models. It covers not only statistical foundations but also extensions and applications of mixture models. The book consists of 19 chapters (each chapter is an independent paper), and collectively, these chapters weave into an elegant web of mixture models” Yen-Chi Chen (U. Washington)

Data Science & Machine Learning book free for download

Posted in Statistics with tags , , , , , , , on November 30, 2020 by xi'an

understanding elections through statistics [book review]

Posted in Books, Kids, R, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , , , , , on October 12, 2020 by xi'an

A book to read most urgently if hoping to take an informed decision by 03 November! Written by a political scientist cum statistician, Ole Forsberg. (If you were thinking of another political scientist cum statistician, he wrote red state blue state a while ago! And is currently forecasting the outcome of the November election for The Economist.)

“I believe [omitting educational level] was the main reason the [Brexit] polls were wrong.”

The first part of the book is about the statistical analysis of opinion polls (assuming their outcome is given, rather than designing them in the first place). And starting with the Scottish independence referendum of 2014. The first chapter covering the cartoon case of simple sampling from a population, with or without replacement, Bayes and non-Bayes. In somewhat too much detail imho given that this is an unrealistic description of poll outcomes. The second chapter expands to stratified sampling (with confusing title [Polling 399] and entry, since it discusses repeated polls that are not processed in said chapter). Mentioning the famous New York Times experiment where five groups of pollsters analysed the same data, making different decisions in adjusting the sample and identifying likely voters, and coming out with a range of five points in the percentage. Starting to get a wee bit more advanced when designing priors for the population proportions. But still studying a weighted average of the voting intentions for each category. Chapter three reaches the challenging task of combining polls, with a 2017 (South) Korea presidential election as an illustration, involving five polls. It includes a solution to handling older polls by proposing a simple linear regression against time. Chapter 4 sums up the challenges of real-life polling by examining the disastrous 2016 Brexit referendum in the UK. Exposing for instance the complicated biases resulting from polling by phone or on-line. The part that weights polling institutes according to quality does not provide any quantitative detail. (And also a weird averaging between the levels of “support for Brexit” and “maybe-support for Brexit”, see Fig. 4.5!) Concluding as quoted above that missing the educational stratification was the cause for missing the shock wave of referendum day is a possible explanation, but the massive difference in turnover between the age groups, itself possibly induced by the reassuring figures of the published polls and predictions, certainly played a role in missing the (terrible) outcome.

“The fabricated results conformed to Benford’s law on first digits, but failed to obey Benford’s law on second digits.” Wikipedia

The second part of this 200 page book is about election analysis, towards testing for fraud. Hence involving the ubiquitous Benford law. Although applied to the leading digit which I do not think should necessarily follow Benford law due to both the varying sizes and the non-uniform political inclinations of the voting districts (of which there are 39 for the 2009 presidential Afghan election illustration, although the book sticks at 34 (p.106)). My impression was that instead lesser digits should be tested. Chapter 4 actually supports the use of the generalised Benford distribution that accounts for differences in turnouts between the electoral districts. But it cannot come up with a real-life election where the B test points out a discrepancy (and hence a potential fraud). Concluding with the author’s doubt [repeated from his PhD thesis] that these Benford tests “are specious at best”, which makes me wonder why spending 20 pages on the topic. The following chapter thus considers other methods, checking for differential [i.e., not-at-random] invalidation by linear and generalised linear regression on the supporting rate in the district. Once again concluding at no evidence of such fraud when analysing the 2010 Côte d’Ivoire elections (that led to civil war). With an extension in Chapter 7 to an account for spatial correlation. The book concludes with an analysis of the Sri Lankan presidential elections between 1994 and 2019, with conclusions of significant differential invalidation in almost every election (even those not including Tamil provinces from the North).

R code is provided and discussed within the text. Some simple mathematical derivations are found, albeit with a huge dose of warnings (“math-heavy”, “harsh beauty”) and excuses (“feel free to skim”, “the math is entirely optional”). Often, one wonders at the relevance of said derivations for the intended audience and the overall purpose of the book. Nonetheless, it provides an interesting entry on (relatively simple) models applied to election data and could certainly be used as an original textbook on modelling aggregated count data, in particular as it should spark the interest of (some) students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

principles of uncertainty (second edition)

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on July 21, 2020 by xi'an

A new edition of Principles of Uncertainty is about to appear. I was asked by CRC Press to review the new book and here are some (raw) extracts from my review. (Some comments may not apply to the final and published version, mind.)

In Chapter 6, the proof of the Central Limit Theorem utilises the “smudge” technique, which is to add an independent noise to both the sequence of rvs and its limit. This is most effective and reminds me of quite a similar proof Jacques Neveu used in its probability notes in Polytechnique. Which went under the more formal denomination of convolution, with the same (commendable) purpose of avoiding Fourier transforms. If anything, I would have favoured a slightly more condensed presentation in less than 8 pages. Is Corollary 6.5.8 useful or even correct??? I do not think so because the non-centred average rescaled by √n diverges almost surely. For the same reason, I object to the very first sentence of Section 6.5 (p.246)

In Chapter 7, I found a nice mention of (Hermann) Rubin’s insistence on not separating probability and utility as only the product matters. And another fascinating quote from Keynes, not from his early statistician’s years, but in 1937 as an established economist

“The sense in which I am using the term uncertain is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence, or the obsolescence of a new invention, or the position of private wealth-owners in the social system in 1970. About these matters there is no scientific basis on which to form any calculable probability whatever. We simply do not know. Nevertheless, the necessity for action and for decision compels us as practical men to do our best to overlook this awkward fact and to behave exactly as we should if we had behind us a good Benthamite calculation of a series of prospective advantages and disadvantages, each multiplied by its appropriate probability, waiting to the summed.”

(is the last sentence correct? I would have expected, pardon my French!, “to be summed”). Further interesting trivia on the criticisms of utility theory, including de Finetti’s role and his own lack of connection with subjective probability principles.

In Chapter 8, a major remark (iii) is found p.293 about the fact that a conjugate family requires a dominating measure (although this is expressed differently since the book shies away from introducing measure theory, ) reminds me of a conversation I had with Jay when I visited Carnegie Mellon in 2013 (?). Which exposes the futility of seeing conjugate priors as default priors. It is somewhat surprising that a notion like admissibility appears as a side quantity when discussing Stein’s paradox in 8.2.1 [and then later in Section 9.1.3] while it seems to me to be central to Bayesian decision theory, much more than the epiphenomenon that Stein’s paradox represents in the big picture. But the book dismisses minimaxity even faster in Section 9.1.4:

As many who suffer from paranoia have discovered, one can always dream-up an even worse possibility to guard against. Thus, the minimax framework is unstable. (p.336)

Interesting introduction of the Wishart distribution to kindly handle random matrices and matrix Jacobians, with the original space being the p(p+1)/2 real space (implicitly endowed with the Lebesgue measure). Rather than a more structured matricial space. A font error makes Corollary 8.7.2 abort abruptly. The space of positive definite matrices is mentioned in Section8.7.5 but still (implicitly) corresponds to the common p(p+1)/2 real Euclidean space. Another typo in Theorem 8.9.2 with a Frenchised version of Dirichlet, Dirichelet. Followed by a Dirchlet at the end of the proof (p.322). Again and again on p.324 and on following pages. I would object to the singular in the title of Section 8.10 as there are exponential families rather than a single one. With no mention made of Pitman-Koopman lemma and its consequences, namely that the existence of conjugacy remains an epiphenomenon. Hence making the amount of pages dedicated to gamma, Dirichlet and Wishart distributions somewhat excessive.

In Chapter 9, I noticed (p.334) a Scheffe that should be Scheffé (and again at least on p.444). (I love it that Jay also uses my favorite admissible (non-)estimator, namely the constant value estimator with value 3.) I wonder at the worth of a ten line section like 9.3, when there are delicate issues in handling meta-analysis, even in a Bayesian mood (or mode). In the model uncertainty section, Jay discuss the (im)pertinence of both selecting one of the models and setting independent priors on their respective parameters, with which I disagree on both levels. Although this is followed by a more reasonable (!) perspective on utility. Nice to see a section on causation, although I would have welcomed an insert on the recent and somewhat outrageous stand of Pearl (and MacKenzie) on statisticians missing the point on causation and counterfactuals by miles. Nonparametric Bayes is a new section, inspired from Ghahramani (2005). But while it mentions Gaussian and Dirichlet [invariably misspelled!] processes, I fear it comes short from enticing the reader to truly grasp the meaning of a prior on functions. Besides mentioning it exists, I am unsure of the utility of this section. This is one of the rare instances where measure theory is discussed, only to state this is beyond the scope of the book (p.349).

a computational approach to statistical learning [book review]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , , on April 15, 2020 by xi'an

This book was sent to me by CRC Press for review for CHANCE. I read it over a few mornings while [confined] at home and found it much more computational than statistical. In the sense that the authors go quite thoroughly into the construction of standard learning procedures, including home-made R codes that obviously help in understanding the nitty-gritty of these procedures, what they call try and tell, but that the statistical meaning and uncertainty of these procedures remain barely touched by the book. This is not uncommon to the machine-learning literature where prediction error on the testing data often appears to be the final goal but this is not so traditionally statistical. The authors introduce their work as (a computational?) supplementary to Elements of Statistical Learning, although I would find it hard to either squeeze both books into one semester or dedicate two semesters on the topic, especially at the undergraduate level.

Each chapter includes an extended analysis of a specific dataset and this is an asset of the book. If sometimes over-reaching in selling the predictive power of the procedures. Printed extensive R scripts may prove tiresome in the long run, at least to me, but this may simply be a generational gap! And the learning models are mostly unidimensional, see eg the chapter on linear smoothers with imho a profusion of methods. (Could someone please explain the point of Figure 4.9 to me?) The chapter on neural networks has a fairly intuitive introduction that should reach fresh readers. Although meeting the handwritten digit data made me shift back to the late 1980’s, when my wife was working on automatic character recognition. But I found the visualisation of the learning weights for character classification hinting at their shape (p.254) most alluring!

Among the things I am missing when reading through this book, a life-line on the meaning of a statistical model beyond prediction, attention to misspecification, uncertainty and variability, especially when reaching outside the range of the learning data, and further especially when returning regression outputs with significance stars, discussions on the assessment tools like the distance used in the objective function (for instance lacking in scale invariance when adding errors on the regression coefficients) or the unprincipled multiplication of calibration parameters, some asymptotics, at least one remark on the information loss due to splitting the data into chunks, giving some (asymptotic) substance when using “consistent”, waiting for a single page 319 to see the “data quality issues” being mentioned. While the methodology is defended by algebraic and calculus arguments, there is very little on the probability side, which explains why the authors consider that the students need “be familiar  with the concepts of expectation, bias and variance”. And only that. A few paragraphs on the Bayesian approach are doing more harm than well, especially with so little background in probability and statistics.

The book possibly contains the most unusual introduction to the linear model I can remember reading: Coefficients as derivatives… Followed by a very detailed coverage of matrix inversion and singular value decomposition. (Would not sound like the #1 priority were I to give such a course.)

The inevitable typo “the the” was found on page 37! A less common typo was Jensen’s inequality spelled as “Jenson’s inequality”. Both in the text (p.157) and in the index, followed by a repetition of the same formula in (6.8) and (6.9). A “stwart” (p.179) that made me search a while for this unknown verb. Another typo in the Nadaraya-Watson kernel regression, when the bandwidth h suddenly turns into n (and I had to check twice because of my poor eyesight!). An unusual use of partition where the sets in the partition are called partitions themselves. Similarly, fluctuating use of dots for products in dimension one, including a form of ⊗ for matricial product (in equation (8.25)) followed next page by the notation for the Hadamard product. I also suspect the matrix K in (8.68) is missing 1’s or am missing the point, since K is the number of kernels on the next page, just after a picture of the Eiffel Tower…) A surprising number of references for an undergraduate textbook, with authors sometimes cited with full name and sometimes cited with last name. And technical reports that do not belong to this level of books. Let me add the pedant remark that Conan Doyle wrote more novels “that do not include his character Sherlock Holmes” than novels which do include Sherlock.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]