Archive for computer simulation

The Effect [book review]

Posted in Books, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on March 10, 2023 by xi'an

While it sounds like the title of a science-fiction catastrophe novel or of a (of course) convoluted nouveau roman, this book by Nick Huntington-Klein is a massive initiation to econometrics and causality. As explained by the subtitle, An Introduction to Research Design and Causality.

This is a hüûüge book, actually made of two parts that could have been books (volumes?). And covering three langages, R, Stata, and Python, which should have led to three independent books. (Seriously, why print three versions when you need at best one?!)  I carried it with me during my vacations in Central Québec, but managed to loose my notes on the first part, which means missing the opportunity for biased quotes! It was mostly written during the COVID lockdown(s), which may explain for a certain amount of verbosity and rambling around.

“My mom loved the first part of the book and she is allergic to statistics.”

The first half (which is in fact a third!) is conceptual (and chatty) and almost formula free, based on the postulate that “it’s a pretty slim portion of students who understand a method because of an equation” (p.xxii). For this reader (or rather reviewer) and on explanations through example, it makes the reading much harder as spotting the main point gets harder (and requires reading most sentences!). And a very slow start since notations and mathematical notions have to be introduced with an excess of caution (as in the distinction between Latin and Greek symbols, p.36). Moving through single variable models, conditional distributions, with a lengthy explanation of how OLS are derived, data generating process and identification (of causes), causal diagrams, back and front doors (a recurrent notion within the book),  treatment effects and a conclusion chapter.

“Unlike statistical research, which is completely made of things that are at least slightly false, statistics itself is almost entirely true.” (p.327)

The second part, called the Toolbox, is closer to a classical introduction to econometrics, albeit with a shortage of mathematics (and no proof whatsoever), although [warning!] logarithms, polynomials, partial derivatives and matrices are used. Along with a consequent (3x) chunk allocated to printed codes, the density of the footnotes significantly increases in this section. It covers an extensive chapter on regression (including testing practice, non-linear and generalised linear models, as well as basic bootstrap without much warning about its use in… regression settings, and LASSO),  one on matching (with propensity scores, kernel weighting, Mahalanobis weighting, one on  simulation, yes simulation! in the sense of producing pseudo-data from known generating processes to check methods, as well as bootstrap (with resampling residuals making at last an appearance!), fixed and random effects (where the author “feels the presence of Andrew Gelman reaching through time and space to disagree”, p.405). The chapter on event studies is about time dependent data with a bit of ARIMA prediction (but nothing on non-stationary series and unit root issues). The more exotic chapters cover (18) difference-in-differences models (control vs treated groups, with John Snow pumping his way in), (19) instrumental variables (aka the minor bane of my 1980’s econometrics courses), with double least squares and generalised methods of moments (if not the simulated version), (20) discontinuity (i.e., changepoints), with the limitation of having a single variate explaining the change, rather than an unknown combination of them, and a rather pedestrian approach to the issue, (iv) other methods (including the first mention of machine learning regression/prediction and some causal forests), concluding with an “Under the rug” portmanteau.

Nothing (afaict) on multivariate regressed variates and simultaneous equations. Hardly an occurrence of Bayesian modelling (p.581), vague enough to remind me of my first course of statistics and the one-line annihilation of the notion.

Duh cover, but nice edition, except for the huge margins that could have been cut to reduce the 622 pages by a third (and harnessed the tendency of the author towards excessive footnotes!). And an unintentional white line on p.238! Cute and vaguely connected little drawings at the head of every chapter (like the head above). A rather terse matter index (except for the entry “The first reader to spot this wins ten bucks“!), which should have been completed with an acronym index.

“Calculus-heads will recognize all of this as taking integrals of the density curve. Did you know there’s calculus hidden inside statistics? The things your professor won’t tell you until it’s too late to drop the class.

Obviously I am biased in that I cannot negatively comment on an author running 5:37 a mile as, by now, I could just compete far from the 5:15 of yester decades! I am just a wee bit suspicious at the reported time, however, given that it happens exactly on page 537… (And I could have clearly taken issue with his 2014 paper, Is Robert anti-teacher? Or with the populist catering to anti-math attitudes as the above found in a footnote!) But I enjoyed reading the conceptual chapter on causality as well as the (more) technical chapter on instrumental variables (a notion I have consistently found confusing all the [long] way from graduate school). And while repeated references are made to Scott Cunningham’s Causal Inference: The Mixtape I think I will stop there with 500⁺ page introductory econometrics books!

[Disclaimer about potential self-plagiarism: this post or an edited version will potentially appear in my Books Review section in CHANCE.]

One World ABC seminar [31.3.22]

Posted in Statistics, University life with tags , , , , , , , , , on March 16, 2022 by xi'an

The next One World ABC seminar is on Thursday 31 March, with David Warnes (from QUT) talking on Multifidelity multilevel Monte Carlo for approximate Bayesian computation It will take place at 10:30 CET (GMT+1).

Models of stochastic processes are widely used in almost all fields of science. However, data are almost always incomplete observations of reality. This leads to a great challenge for statistical inference because the likelihood function will be intractable for almost all partially observed stochastic processes. As a result, it is common to apply likelihood-free approaches that replace likelihood evaluations with realisations of the model and observation process. However, likelihood-free techniques are computationally expensive for accurate inference as they may require millions of high-fidelity, expensive stochastic simulations. To address this challenge, we develop a novel approach that combines the multilevel Monte Carlo telescoping summation, applied to a sequence of approximate Bayesian posterior targets, with a multifidelity rejection sampler that learns from low-fidelity, computationally inexpensive,
model approximations to minimise the number of high-fidelity, computationally expensive, simulations required for accurate inference. Using examples from systems biology, we demonstrate improvements of more than two orders of magnitude over standard rejection sampling techniques

One World ABC seminar [24.2.22]

Posted in Statistics, University life with tags , , , , , , , , , , on February 22, 2022 by xi'an

The next One World ABC seminar is on Thursday 24 Feb, with Rafael Izbicki talking on Likelihood-Free Frequentist Inference – Constructing Confidence Sets with Correct Conditional Coverage. It will take place at 14:30 CET (GMT+1).

Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce reliable measures of uncertainty. We present a statistical framework for LFI that unifies classical statistics with modern machine learning to: (1) efficiently construct frequentist confidence sets and hypothesis tests with finite-sample guarantees of nominal coverage (type I error control) and power; (2) provide practical diagnostics
for assessing empirical coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that estimates a test statistic, like the likelihood ratio, can be plugged into our framework to create valid confidence sets and compute diagnostics, without costly Monte Carlo samples at fixed parameter settings. In this work, we specifically study the power of two test statistics (ACORE and BFF), which, respectively, maximize versus integrate an odds function over the parameter space. Our study offers multifaceted perspectives on the challenges in LF2I. This is joint work with Niccolo Dalmasso, David Zhao and Ann B. Lee.

where will stars be in 10⁶ years?

Posted in Kids, pictures, Travel with tags , , , , , , on December 4, 2020 by xi'an

Berni Alder obituary in Nature [and the Metropolis algorithm]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on December 4, 2020 by xi'an

When reading through the 15 October issue of Nature, I came across an obituary by David Ceperley for Berni Alder (1925-2020). With Thomas Wainwright, Alder invented the technique of molecular dynamics, “silencing criticism that the results were the product of inaccurate computer arithmetic.” 

“Berni Alder pioneered computer simulation, in particular of the dynamics of atoms and molecules in condensed matter. To answer fundamental questions, he encouraged the view that computer simulation was a new way of doing science, one that could connect theory with experiment. Alder’s vision transformed the field of statistical mechanics and many other areas of applied science.”

As I was completely unaware of Alder’s contributions to the field, I was most surprised to read the following

“During his PhD, he and the computer scientist Stan Frankel developed an early Monte Carlo algorithm — one in which the spheres are given random displacements — to calculate the properties of the hard-sphere fluid. The advance was scooped by Nicholas Metropolis and his group at the Los Alamos National Laboratory in New Mexico.”

that would imply missing credit is due!, but I could only find the following information on Stan Frankel’s Wikipedia page: Frankel “worked with PhD candidate Berni Alder in 1949–1950 to develop what is now known as Monte Carlo analysis. They used techniques that Enrico Fermi had pioneered in the 1930s. Due to a lack of local computing resources, Frankel travelled to England in 1950 to run Alder’s project on the Manchester Mark 1 computer. Unfortunately, Alder’s thesis advisor [John Kirkwood] was unimpressed, so Alder and Frankel delayed publication of their results until 1955, in the Journal of Chemical Physics. This left the major credit for the technique to a parallel project by a team including Teller and Metropolis who published similar work in the same journal in 1953.” The (short) paper by Alder, Frankel and Lewinson is however totally silent on a potential precursor to the Metropolis et al. algorithm (included in its references)… It also contains a proposal for a completely uniform filling of a box by particles, provided they do not overlap, but the authors had to stop at 98 particles due to its inefficiency.

%d bloggers like this: