Archive for book review

false value

Posted in Statistics with tags , , , , , , , , , , , on November 23, 2020 by xi'an

A very pleasant eighth volume in the Rivers of London series after a few so-so episodes! The relentless deadpan of Peter Grant is back full shape, the plot is substantial and gripping, new and well-drawn characters abound, and the story offers an original retelling of the Difference Engine. (Not that I have reservations about Gibbson’s plus Sterling’s 1990 version!) Including mentions of Jacquard’s loom, card fed organ automates, Ada Lovelace and Mary Somerville. Plus providing great satire on Ai companies with a hardly modified “Deep Thought” pastiche. Enjoyable all along and definitely a page turner that I read within three days..! And being strongly immersed in the current era, from the passing away of David Bowie to the dearful impact of Theresa May as home secretary. Presumably missing a heap of references to geek culture and subcultures, apart from Hitchhiker Guide to the Galaxy. And too many quotes to report, but some mentions of stats (“the Red Army had done a statistical analysis with demon traps just as they had with conventional minefields. The conclusions had been the same in both cases.” (p.50) and “Beverley climbed into the bath with a second-hand copy of Statistics for Environmental Science and Management” (p.69), which is a genuine book.) As often the end is a bit murky and a bit precipitated, but not enough to whine about. Recommended (conditional on having read the earliest ones in the series)!

“a rare blend of monster raving egomania and utter batshit insanity”

Posted in Books, pictures, University life with tags , , , , , , , , , , , , on November 12, 2020 by xi'an

“I don’t object to speculation or radical proposals, even to radical, grandiose speculative proposals; I just want there to be arguments to back them up, reasons to take them seriously. I don’t object to scientists displaying personality in their work, or staking out positions in vigorous opposition to much of the opinion in their field, and engaging in heated debate; I do object to ignoring criticism and claiming credit for commonplaces, especially before popular audiences who won’t pick up on it.”

A recent post by Andrew on Stephen Wolfram’s (mega) egomania led to a much older post by Cosma Shalizi reviewing the perfectly insane 5.57 pounds of a New Kind of Science. An exhilarating review, trashing the pretentious self-celebration of a void paradigm shift advanced by Wolfram and its abyssal lack of academic rigour, showing anew that a book recommended by Bill Gates is not necessarily a great book. (Note that A New Kind of Science is available for free on-line.)

“Let me try to sum up. On the one hand, we have a large number of true but commonplace ideas, especially about how simple rules can lead to complex outcomes, and about the virtues of toy models. On the other hand, we have a large mass of dubious speculations (many of them also unoriginal). We have, finally, a single new result of mathematical importance, which is not actually the author’s. Everything is presented as the inspired fruit of a lonely genius, delivering startling insights in isolation from a blinkered and philistine scientific community.”

When I bought this monstrous book (eons before I started the ‘Og!), I did not get much further into it than the first series of cellular automata screen copies that fill page after page. And quickly if carefully dropped it by my office door in the corridor. Where it stayed for a few days until one of my colleagues most politely asked me if he could borrow it. (This happens all the time: once I have read or given up on a book I do not imagine reopening again, I put it in the coffee room or, for the least recommended books, on the floor by my door and almost invariably whoever is interested will first ask me for permission. Which is very considerate and leads to pleasant discussions on the said books. Only recently did the library set shelves outside its doors for dropping books free for the taking, but even there I sometimes get colleagues wondering [rightly] if I was the one abandoning there a particular book.)

“I am going to keep my copy of A New Kind of Science, sitting on the same shelf as Atlantis in Wisconsin, The Cosmic Forces of Mu, Of Grammatology, and the people who think the golden ratio explains the universe.”

In case the review is not enough to lighten up your day, in these gloomy times, there is a wide collection of them from the 2000’s, although most of the links have turned obsolete. (The Maths Reviews review has not.) As presumably this very post about a eighteen-years-old non-event…

if then [reading a book self-review]

Posted in Statistics with tags , , , , , , , , , , , , , on October 26, 2020 by xi'an

Nature of 17 September 2020 has a somewhat surprising comment section where an author, Jill Lepore from Harvard University, actually summarises her own book, If Then: How the Simulmatics Corporation invented the Future. This book is the (hi)story of a precursor of Big Data Analytics, Simulmatics, which used as early as 1959 clustering and simulation to predict election results and if possible figure out discriminant variables. Which apparently contributed to John F. Kennedy’ s victory over Richard Nixon in 1960. Rather than admiring the analytic abilities of such precursors (!), the author is blaming them for election interference. A criticism that could apply to any kind of polling, properly or improperly conducted. The article also describes how Simulmatics went into advertising, econometrics and counter-insurgency, vainly trying to predict the occurence and location of riots (at home) and revolutions (abroad). And argues in a all-encompassing critique against any form of data-analytics applied to human behaviour. And praises the wisdom of 1968 protesters over current Silicon Valley researchers (whose bosses may have been among these 1968 protesters!)… (Stressing again that my comments come from reading and reacting to the above Nature article, not the book itself!)

democracy suffers when government statistics fail [review of a book review]

Posted in Books, Statistics, Travel with tags , , , , , , , , , , on October 13, 2020 by xi'an

This week, rather extraordinarily!, Nature book review was about official statistics, with a review of Julia Lane’s Democratizing our Data. (The democratizing in the title is painful to watch, though!) The reviewer is Beth Simone Noveck, who was deputy chief technology officer under Barack Obama and a major researcher in digital democracy, excusez du peu! (By comparison, Trump’s deputy chief technology officer had a B.A. in politics and no other qualification for the job, but got nonetheless promoted to chief…)

“Lane asserts that the United States is failing to adequately track its population, economy and society. Agencies are stagnating. The census dramatically undercounts people from minority racial groups. There is no complete national list of households. The data are made available two years after the count, making them out of date as the basis for effective policy making.” B.S. Noveck

The debate raised by the book on the ability of official statistics to keep track of people in a timely manner is most interesting. And not limited to the USA, even though it seems to fit in a Hell of its own:

“In the United States, there is no single national statistical agency. The process of gathering and publishing public data is fragmented across multiple departments and agencies, making it difficult to introduce new ideas across the whole enterprise. Each agency is funded by, and accountable to, a different congressional committee. Congress once sued the commerce department for attempting to introduce modern techniques of statistical sampling to shore up a flawed census process that involves counting every person by hand.” B.S. Noveck

This remark brings back to (my) mind the titanesque debates of the 1990s when Republicans attacked sampling techniques and statisticians like Steve Fienberg rose to their defence. (Although others like David Freedman opposed the move, paradoxically mistrusting statistics!) The French official statistic institute, INSEE, has been running sampled census(es) for decades now, without the national representation going up in arms. I am certainly being partial, having been associated with INSEE, its statistics school ENSAE and its research branch CREST since 1982, but it seems to me that the hiring of highly skilled and thoroughly trained civil servants by this institute helps in making the statistics it produces more trustworthy and efficient, including measuring the impact of public policies. (Even though accusations of delay and bias show up regularly.) And in making the institute more prone to adopt new methods, thanks to the rotation of its agents. (B.S. Noveck notices and deplores the absence of reference to foreign agencies in the book.)

“By contrast, the best private-sector companies produce data that are in real time, comprehensive, relevant, accessible and meaningful.”  B.S. Noveck

However, the notion in the review (and the book?) that private companies are necessarily doing better is harder to buy, if an easy jab at a public institution. Indeed, public official statistic institutes are the only one to have access to data covering the entire population, either directly or through other public institutes, like the IRS or social security claims. And trusting the few companies with a similar reach is beyond naïve (even though a company like Amazon has almost an instantaneous and highly local sensor of economic and social conditions!). And at odds for the call of democratizing, as shown by the impact of some of these companies on the US elections.

understanding elections through statistics [book review]

Posted in Books, Kids, R, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , , , , , on October 12, 2020 by xi'an

A book to read most urgently if hoping to take an informed decision by 03 November! Written by a political scientist cum statistician, Ole Forsberg. (If you were thinking of another political scientist cum statistician, he wrote red state blue state a while ago! And is currently forecasting the outcome of the November election for The Economist.)

“I believe [omitting educational level] was the main reason the [Brexit] polls were wrong.”

The first part of the book is about the statistical analysis of opinion polls (assuming their outcome is given, rather than designing them in the first place). And starting with the Scottish independence referendum of 2014. The first chapter covering the cartoon case of simple sampling from a population, with or without replacement, Bayes and non-Bayes. In somewhat too much detail imho given that this is an unrealistic description of poll outcomes. The second chapter expands to stratified sampling (with confusing title [Polling 399] and entry, since it discusses repeated polls that are not processed in said chapter). Mentioning the famous New York Times experiment where five groups of pollsters analysed the same data, making different decisions in adjusting the sample and identifying likely voters, and coming out with a range of five points in the percentage. Starting to get a wee bit more advanced when designing priors for the population proportions. But still studying a weighted average of the voting intentions for each category. Chapter three reaches the challenging task of combining polls, with a 2017 (South) Korea presidential election as an illustration, involving five polls. It includes a solution to handling older polls by proposing a simple linear regression against time. Chapter 4 sums up the challenges of real-life polling by examining the disastrous 2016 Brexit referendum in the UK. Exposing for instance the complicated biases resulting from polling by phone or on-line. The part that weights polling institutes according to quality does not provide any quantitative detail. (And also a weird averaging between the levels of “support for Brexit” and “maybe-support for Brexit”, see Fig. 4.5!) Concluding as quoted above that missing the educational stratification was the cause for missing the shock wave of referendum day is a possible explanation, but the massive difference in turnover between the age groups, itself possibly induced by the reassuring figures of the published polls and predictions, certainly played a role in missing the (terrible) outcome.

“The fabricated results conformed to Benford’s law on first digits, but failed to obey Benford’s law on second digits.” Wikipedia

The second part of this 200 page book is about election analysis, towards testing for fraud. Hence involving the ubiquitous Benford law. Although applied to the leading digit which I do not think should necessarily follow Benford law due to both the varying sizes and the non-uniform political inclinations of the voting districts (of which there are 39 for the 2009 presidential Afghan election illustration, although the book sticks at 34 (p.106)). My impression was that instead lesser digits should be tested. Chapter 4 actually supports the use of the generalised Benford distribution that accounts for differences in turnouts between the electoral districts. But it cannot come up with a real-life election where the B test points out a discrepancy (and hence a potential fraud). Concluding with the author’s doubt [repeated from his PhD thesis] that these Benford tests “are specious at best”, which makes me wonder why spending 20 pages on the topic. The following chapter thus considers other methods, checking for differential [i.e., not-at-random] invalidation by linear and generalised linear regression on the supporting rate in the district. Once again concluding at no evidence of such fraud when analysing the 2010 Côte d’Ivoire elections (that led to civil war). With an extension in Chapter 7 to an account for spatial correlation. The book concludes with an analysis of the Sri Lankan presidential elections between 1994 and 2019, with conclusions of significant differential invalidation in almost every election (even those not including Tamil provinces from the North).

R code is provided and discussed within the text. Some simple mathematical derivations are found, albeit with a huge dose of warnings (“math-heavy”, “harsh beauty”) and excuses (“feel free to skim”, “the math is entirely optional”). Often, one wonders at the relevance of said derivations for the intended audience and the overall purpose of the book. Nonetheless, it provides an interesting entry on (relatively simple) models applied to election data and could certainly be used as an original textbook on modelling aggregated count data, in particular as it should spark the interest of (some) students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]