Archive for Scotland

understanding elections through statistics [book review]

Posted in Books, Kids, R, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , , , , , on October 12, 2020 by xi'an

A book to read most urgently if hoping to take an informed decision by 03 November! Written by a political scientist cum statistician, Ole Forsberg. (If you were thinking of another political scientist cum statistician, he wrote red state blue state a while ago! And is currently forecasting the outcome of the November election for The Economist.)

“I believe [omitting educational level] was the main reason the [Brexit] polls were wrong.”

The first part of the book is about the statistical analysis of opinion polls (assuming their outcome is given, rather than designing them in the first place). And starting with the Scottish independence referendum of 2014. The first chapter covering the cartoon case of simple sampling from a population, with or without replacement, Bayes and non-Bayes. In somewhat too much detail imho given that this is an unrealistic description of poll outcomes. The second chapter expands to stratified sampling (with confusing title [Polling 399] and entry, since it discusses repeated polls that are not processed in said chapter). Mentioning the famous New York Times experiment where five groups of pollsters analysed the same data, making different decisions in adjusting the sample and identifying likely voters, and coming out with a range of five points in the percentage. Starting to get a wee bit more advanced when designing priors for the population proportions. But still studying a weighted average of the voting intentions for each category. Chapter three reaches the challenging task of combining polls, with a 2017 (South) Korea presidential election as an illustration, involving five polls. It includes a solution to handling older polls by proposing a simple linear regression against time. Chapter 4 sums up the challenges of real-life polling by examining the disastrous 2016 Brexit referendum in the UK. Exposing for instance the complicated biases resulting from polling by phone or on-line. The part that weights polling institutes according to quality does not provide any quantitative detail. (And also a weird averaging between the levels of “support for Brexit” and “maybe-support for Brexit”, see Fig. 4.5!) Concluding as quoted above that missing the educational stratification was the cause for missing the shock wave of referendum day is a possible explanation, but the massive difference in turnover between the age groups, itself possibly induced by the reassuring figures of the published polls and predictions, certainly played a role in missing the (terrible) outcome.

“The fabricated results conformed to Benford’s law on first digits, but failed to obey Benford’s law on second digits.” Wikipedia

The second part of this 200 page book is about election analysis, towards testing for fraud. Hence involving the ubiquitous Benford law. Although applied to the leading digit which I do not think should necessarily follow Benford law due to both the varying sizes and the non-uniform political inclinations of the voting districts (of which there are 39 for the 2009 presidential Afghan election illustration, although the book sticks at 34 (p.106)). My impression was that instead lesser digits should be tested. Chapter 4 actually supports the use of the generalised Benford distribution that accounts for differences in turnouts between the electoral districts. But it cannot come up with a real-life election where the B test points out a discrepancy (and hence a potential fraud). Concluding with the author’s doubt [repeated from his PhD thesis] that these Benford tests “are specious at best”, which makes me wonder why spending 20 pages on the topic. The following chapter thus considers other methods, checking for differential [i.e., not-at-random] invalidation by linear and generalised linear regression on the supporting rate in the district. Once again concluding at no evidence of such fraud when analysing the 2010 Côte d’Ivoire elections (that led to civil war). With an extension in Chapter 7 to an account for spatial correlation. The book concludes with an analysis of the Sri Lankan presidential elections between 1994 and 2019, with conclusions of significant differential invalidation in almost every election (even those not including Tamil provinces from the North).

R code is provided and discussed within the text. Some simple mathematical derivations are found, albeit with a huge dose of warnings (“math-heavy”, “harsh beauty”) and excuses (“feel free to skim”, “the math is entirely optional”). Often, one wonders at the relevance of said derivations for the intended audience and the overall purpose of the book. Nonetheless, it provides an interesting entry on (relatively simple) models applied to election data and could certainly be used as an original textbook on modelling aggregated count data, in particular as it should spark the interest of (some) students.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

CANSSI on HMMs

Posted in Statistics, University life with tags , , , , , , on September 21, 2020 by xi'an

The Canadian Statistical Sciences Institute/Institut canadien des sciences statistiques is launching a series of on-line seminars, held once a month.  With journal clubs to prepare the seminar and with student-only meetings with the speakers after each seminar.

Seminars will be broadcast live on the fourth Thursday of the month from 1-2:15 pm Eastern time (18 GMT+2).  Students will meet virtually with the speaker from 2:30-3:30 pm Eastern time. Talks in the fall will focus on Hidden Markov Models, starting on Thursday, September 24, 2020 with Ruth King of the University of Edinburgh.

a journal of the plague year [lazy August reviews]

Posted in Books, pictures, Running, Travel with tags , , , , , , , , , , , , , , , , , , , , , , on August 22, 2020 by xi'an

Read Blood of Empire, the final volume in the Gods of Blood and Powder trilogy. By Brian McClellan. Which I enjoyed reasonably well as bedside literature, although its weight meant it would fall at the slightest hint of sleep… It took me longer than expected to connect to the story, given I had read the previous volume a few months ago. This series is classified as “flintrock fantasy”, a category I had never heard of previous, meaning a limited amount of gunpower is used in weapons, along with the aid of magical abilities (for the happy few). The style is a wee bit heavy and repetitive, but the characters are definitely engaging if over-prone to inner dialogues… The only annoying part in the plot is the presence of a super-evil character about to be become a god, which ruins most of the balance in the story.

Had a long-pending due watch at Trainspotting T2. (Loved the NYT label as “Rated R for a bagful of vomit, mouthfuls of bigotry and nosefuls of cocaine”, obviously in the same regressive spirit as the film.) This is definitely a sequel to the first film. And hence hardly comprehensible on its own. Except for a few locations like a run-down pub on the edge of nowhere, a flat overlooking a car part dump and Spud’s high-rise welfare housing, T2 lacks the gritty vision of Edinburgh found in its forbear. And the characters have lost their toxic edge, except maybe very much maybe for the psychopath Franck. Even the de rigueur final swindle has a rosy and predictable justification. Fun nonetheless! On the (tourist) side, I enjoyed a mostly superfluous scene where Renton takes Spud running up Arthur’s Seat along its most scenic route, with an iconic end image of Edinburgh gradually fading into fog. There is also a surreal (short) scene on Rannoch Mor, with the Oban train stopping at the hikers’ stop. (I never managed to start Welsh’s books, due to their phonetic rendering of Edniburghian Scots that make reading unbearable..! By comparison, most dialogues are understandable. A funny line when the hostess welcoming tourists at Edinburgh Airport with a mock Scottish accent acknowledges she is from Slovakia.) Camera tricks like fast backward and colour filters a wee bit old-fashioned and heavy-handed, in the spirit of the first movie as if nothing had ever happened since. Maybe the moral of the story. Not looking for a potential T3, though.

Read a forgotten volume in the Bernhard Günther series of Philip Kerr, A man without breath. As usual building on historical events from Nazi Germany to set this ambivalent character at the centre of the action, which is this time the discovery and exploitation of the Katyǹ massacres by the Nazi propaganda to drive an edge between the Soviet Union and the other Allies. The book is rather uneven, with too many plots, subplots, and characters, and open criticisms of the Nazi regime between complete strangers do not ring particularly realistic. And draw attention away from their own massacres, like Babi Yar (celebrated in Dmitri Shostakovitch’s Symphony No. 13). Interestingly, given that I read the book at the time of the JSM round-table, a thread in the story links to the Spanish Civil War and the attempt by fascist doctors like Vallejo Nágera to picture left-wing Spaniards as psychiatrically degenerates, fantasying the existence of a “red” gene… (It took me a while to trace the reference in the title to Goebbels’ quote “A nation with no religion is like a man without breath.” )

sequential neural likelihood estimation as ABC substitute

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on May 14, 2020 by xi'an

A JMLR paper by Papamakarios, Sterratt, and Murray (Edinburgh), first presented at the AISTATS 2019 meeting, on a new form of likelihood-free inference, away from non-zero tolerance and from the distance-based versions of ABC, following earlier papers by Iain Murray and co-authors in the same spirit. Which I got pointed to during the ABC workshop in Vancouver. At the time I had no idea as to autoregressive flows meant. We were supposed to hold a reading group in Paris-Dauphine on this paper last week, unfortunately cancelled as a coronaviral precaution… Here are some notes I had prepared for the meeting that did not take place.

A simulator model is a computer program, which takes a vector of parameters θ, makes internal calls to a random number generator, and outputs a data vector x.”

Just the usual generative model then.

“A conditional neural density estimator is a parametric model q(.|φ) (such as a neural network) controlled by a set of parameters φ, which takes a pair of datapoints (u,v) and outputs a conditional probability density q(u|v,φ).”

Less usual, in that the outcome is guaranteed to be a probability density.

“For its neural density estimator, SNPE uses a Mixture Density Network, which is a feed-forward neural network that takes x as input and outputs the parameters of a Gaussian mixture over θ.”

In which theoretical sense would it improve upon classical or Bayesian density estimators? Where are the error evaluation, the optimal rates, the sensitivity to the dimension of the data? of the parameter?

“Our new method, Sequential Neural Likelihood (SNL), avoids the bias introduced by the proposal, by opting to learn a model of the likelihood instead of the posterior.”

I do not get the argument in that the final outcome (of using the approximation within an MCMC scheme) remains biased since the likelihood is not the exact likelihood. Where is the error evaluation? Note that in the associated Algorithm 1, the learning set is enlarged on each round, as in AMIS, rather than set back to the empty set ∅ on each round.

…given enough simulations, a sufficiently flexible conditional neural density estimator will eventually approximate the likelihood in the support of the proposal, regardless of the shape of the proposal. In other words, as long as we do not exclude parts of the parameter space, the way we propose parameters does not bias learning the likelihood asymptotically. Unlike when learning the posterior, no adjustment is necessary to account for our proposing strategy.”

This is a rather vague statement, with the only support being that the Monte Carlo approximation to the Kullback-Leibler divergence does converge to its actual value, i.e. a direct application of the Law of Large Numbers! But an interesting point I informally made a (long) while ago that all that matters is the estimate of the density at x⁰. Or at the value of the statistic at x⁰. The masked auto-encoder density estimator is based on a sequence of bijections with a lower-triangular Jacobian matrix, meaning the conditional density estimate is available in closed form. Which makes it sounds like a form of neurotic variational Bayes solution.

The paper also links with ABC (too costly?), other parametric approximations to the posterior (like Gaussian copulas and variational likelihood-free inference), synthetic likelihood, Gaussian processes, noise contrastive estimation… With experiments involving some of the above. But the experiments involve rather smooth models with relatively few parameters.

“A general question is whether it is preferable to learn the posterior or the likelihood (…) Learning the likelihood can often be easier than learning the posterior, and it does not depend on the choice of proposal, which makes learning easier and more robust (…) On the other hand, methods such as SNPE return a parametric model of the posterior directly, whereas a further inference step (e.g. variational inference or MCMC) is needed on top of SNL to obtain a posterior estimate”

A fair point in the conclusion. Which also mentions the curse of dimensionality (both for parameters and observations) and the possibility to work directly with summaries.

Getting back to the earlier and connected Masked autoregressive flow for density estimation paper, by Papamakarios, Pavlakou and Murray:

“Viewing an autoregressive model as a normalizing flow opens the possibility of increasing its flexibility by stacking multiple models of the same type, by having each model provide the source of randomness for the next model in the stack. The resulting stack of models is a normalizing flow that is more flexible than the original model, and that remains tractable.”

Which makes it sound like a sort of a neural network in the density space. Optimised by Kullback-Leibler minimisation to get asymptotically close to the likelihood. But a form of Bayesian indirect inference in the end, namely an MLE on a pseudo-model, using the estimated model as a proxy in Bayesian inference…

strange loyalties [book review]

Posted in Statistics with tags , , , , , , , , , , , on April 26, 2020 by xi'an

This book by William McIlvarnney is the third and last one in the Laidlaw investigation series and the most original of the three as far as I am concerned… For it is more an inner quest than a crime investigation, as the detective is seeking an explanation to the accidental death of his brother as well as the progressive deterioration of their relation, while trying to make sense of his own life and his relation to women. It is thus as far a crime novel as it is possible, although there are criminals involved. And Laidlaw cannot separate his “job” from his personal life, meaning he does investigate on his free time the death of his brother.  It is entirely written in a first-person perspective, which makes the reading harder and slower in my case. But an apt conclusion to the trilogy, rather than being pulled into finer and finer threads as other detective stories. Brilliant (like the light on Skye during the rain).

“Life was only in the living of it. How you act and what you are and what you do and how you be were the only substance. They didn’t last either. But while you were here, they made what light there was – the wick that threads the candle-grease of time. His light was out but here I felt I could almost smell the smoke still drifting from its snuffing.”