Archive for econometrics

The Effect [book review]

Posted in Books, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on March 10, 2023 by xi'an

While it sounds like the title of a science-fiction catastrophe novel or of a (of course) convoluted nouveau roman, this book by Nick Huntington-Klein is a massive initiation to econometrics and causality. As explained by the subtitle, An Introduction to Research Design and Causality.

This is a hüûüge book, actually made of two parts that could have been books (volumes?). And covering three langages, R, Stata, and Python, which should have led to three independent books. (Seriously, why print three versions when you need at best one?!)  I carried it with me during my vacations in Central Québec, but managed to loose my notes on the first part, which means missing the opportunity for biased quotes! It was mostly written during the COVID lockdown(s), which may explain for a certain amount of verbosity and rambling around.

“My mom loved the first part of the book and she is allergic to statistics.”

The first half (which is in fact a third!) is conceptual (and chatty) and almost formula free, based on the postulate that “it’s a pretty slim portion of students who understand a method because of an equation” (p.xxii). For this reader (or rather reviewer) and on explanations through example, it makes the reading much harder as spotting the main point gets harder (and requires reading most sentences!). And a very slow start since notations and mathematical notions have to be introduced with an excess of caution (as in the distinction between Latin and Greek symbols, p.36). Moving through single variable models, conditional distributions, with a lengthy explanation of how OLS are derived, data generating process and identification (of causes), causal diagrams, back and front doors (a recurrent notion within the book),  treatment effects and a conclusion chapter.

“Unlike statistical research, which is completely made of things that are at least slightly false, statistics itself is almost entirely true.” (p.327)

The second part, called the Toolbox, is closer to a classical introduction to econometrics, albeit with a shortage of mathematics (and no proof whatsoever), although [warning!] logarithms, polynomials, partial derivatives and matrices are used. Along with a consequent (3x) chunk allocated to printed codes, the density of the footnotes significantly increases in this section. It covers an extensive chapter on regression (including testing practice, non-linear and generalised linear models, as well as basic bootstrap without much warning about its use in… regression settings, and LASSO),  one on matching (with propensity scores, kernel weighting, Mahalanobis weighting, one on  simulation, yes simulation! in the sense of producing pseudo-data from known generating processes to check methods, as well as bootstrap (with resampling residuals making at last an appearance!), fixed and random effects (where the author “feels the presence of Andrew Gelman reaching through time and space to disagree”, p.405). The chapter on event studies is about time dependent data with a bit of ARIMA prediction (but nothing on non-stationary series and unit root issues). The more exotic chapters cover (18) difference-in-differences models (control vs treated groups, with John Snow pumping his way in), (19) instrumental variables (aka the minor bane of my 1980’s econometrics courses), with double least squares and generalised methods of moments (if not the simulated version), (20) discontinuity (i.e., changepoints), with the limitation of having a single variate explaining the change, rather than an unknown combination of them, and a rather pedestrian approach to the issue, (iv) other methods (including the first mention of machine learning regression/prediction and some causal forests), concluding with an “Under the rug” portmanteau.

Nothing (afaict) on multivariate regressed variates and simultaneous equations. Hardly an occurrence of Bayesian modelling (p.581), vague enough to remind me of my first course of statistics and the one-line annihilation of the notion.

Duh cover, but nice edition, except for the huge margins that could have been cut to reduce the 622 pages by a third (and harnessed the tendency of the author towards excessive footnotes!). And an unintentional white line on p.238! Cute and vaguely connected little drawings at the head of every chapter (like the head above). A rather terse matter index (except for the entry “The first reader to spot this wins ten bucks“!), which should have been completed with an acronym index.

“Calculus-heads will recognize all of this as taking integrals of the density curve. Did you know there’s calculus hidden inside statistics? The things your professor won’t tell you until it’s too late to drop the class.

Obviously I am biased in that I cannot negatively comment on an author running 5:37 a mile as, by now, I could just compete far from the 5:15 of yester decades! I am just a wee bit suspicious at the reported time, however, given that it happens exactly on page 537… (And I could have clearly taken issue with his 2014 paper, Is Robert anti-teacher? Or with the populist catering to anti-math attitudes as the above found in a footnote!) But I enjoyed reading the conceptual chapter on causality as well as the (more) technical chapter on instrumental variables (a notion I have consistently found confusing all the [long] way from graduate school). And while repeated references are made to Scott Cunningham’s Causal Inference: The Mixtape I think I will stop there with 500⁺ page introductory econometrics books!

[Disclaimer about potential self-plagiarism: this post or an edited version will potentially appear in my Books Review section in CHANCE.]

efficient measures?

Posted in Books, Statistics, University life with tags , , , , , , , , , on July 24, 2022 by xi'an


When checking the infographics of the week highlighted by Nature, I came across this comparison of France and Germany for the impact of their respective vaccination mandates on health and economics. And then realised this was from a preprint from a Paris Dauphine colleague, Miquel Oliu-Barton (and co-authors). The above graphs compare the impact of governmental measures towards vaccination, short of compulsory vaccination (unfortunately).  Between Germany and France, it appears as if the measures were more effective in the latter. Which may be interpreted as either a consequence of the measures being more coercive in [unruly] France or an illustration of the higher discipline of the German society [despite the government contemplating compulsory vaccination for a while]. As an aside, I am very surprised at the higher death rate in Germany but, beside a larger percentage of people over 65 there and a lower life expectancy, the French curve is interrupted in December 2021. Looking at 2022, the peak was reached at 3.3 cases per day per million people.

Concerning the red counterfactual curves, I did not find much explanation in the preprint, apart from

“Our results are supported by the well-established econometric method of synthetic control.³⁰ We construct counterfactuals for each treated country based on a weighted average of countries that did not implement the COVID certificate and find consistent trajectories for the time period where this method is feasible, i.e., until the end of September 2021.”

and

“constructing counterfactuals ( i.e., by modelling vaccine uptake without this intervention), using innovation diffusion theory.⁶Innovation diffusion theory was introduced to model how new ideas and technologies spread”

which is not particularly helpful without further reading.

Nature on U.S. abortion laws

Posted in Books, Kids with tags , , , , , , , , on November 4, 2021 by xi'an

The 26 October issue of Nature has a news article on the involvement of US scientists and scientific organisations in fact-checking the dubious arguments made by anti-abortion supporters, incl. several US States. None of them (arguments) are convincing or objective, but providing data and statistical models to counter them is welcome, especially in a scientific journal like Nature.

“…an initiative to compare women who had abortions with those who wanted them, but were turned away from clinics for various reasons, including state restrictions or a lack of doctor availability. Called the Turnaway Study, the effort followed about 1,000 women in the United States for five years after they sought abortions. The women were similar in terms of physical, mental and economic well-being initially, but diverged over time (…) on average, receiving an abortion didn’t harm women’s mental or physical health, but being denied an abortion resulted in some negative financial and health outcomes.”

“Allowing states to ban abortion might even increase maternal and infant mortality rates (…) Unwanted pregnancies are associated with worse health outcomes for several reasons, including that people who plan their pregnancies tend to change their behaviour — drinking less alcohol, for example — and receive prenatal medical care long before those who are surprised by their pregnancy and don’t want it.”

“statistical methods developed over the past 30 years allow researchers to isolate and measure the effects of abortion policies (…) Abortion legalization in the 1970s helped to increase women’s educational attainment, participation in the labour force and earnings — especially for single Black women.”

“The United States is alone among wealthy nations in not mandating paid maternity leave (…) a single parent earning the minimum wage would need to spend more than two-thirds of their income on childcare, with care for the average infant costing about US$10,400 per year (…) two main reasons that women give for seeking abortions are concerns about money and caring for existing children.”

causal inference makes it to Stockholm

Posted in Statistics with tags , , , , , , , on October 12, 2021 by xi'an

Yesterday, Joshua Angrist and Guido Imbens, whose most cited paper is this JASA 1996 article with Don Rubin, were awarded the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel for 2021. It is one of these not-so-rare instances when econometricians get this prize, with causality the motive for their award. I presume this will not see the number of Biometrika submissions involving causal inference go down! (Imbens wrote a book on causal inference with Don Rubin, and is currently editor of Econometrica. And Angrist wrote Mostly Harmless Econometrics, with J.S. Pischke, which I have not read.)

ERC descriptors

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on November 9, 2020 by xi'an

Here are the descriptors (or keywords) validated by the (European Research Council) ERC for submitting grant proposal. The recent addition of PE1_15 in the Mathematics panel should help when submitting more methodological projects:

PE1_14 Mathematical statistics
PE1_15 Generic statistical methodology and modelling
PE1_19 Scientific computing and data processing

even though other panels could prove equally suited for some, as in Computer Science and Informatics,

PE6_7 Artificial intelligence, intelligent systems, natural language processing
PE6_10 Web and information systems, data management systems, information retrieval and digital libraries, data fusion
PE6_11 Machine learning, statistical data processing and applications using signal processing (e.g. speech, image, video)
PE6_12 Scientific computing, simulation and modelling tools
PE6_13 Bioinformatics, bio-inspired computing, and natural computing

in Systems and Communication Engineering,

PE7_7 Signal processing

in Integrative Biology,

LS2_11 Bioinformatics and computational biology
LS2_12 Biostatistics

in Prevention,Diagnosis and Treatment of Human Diseases,

LS7_1 Medical imaging for prevention, diagnosis and monitoring of diseases
LS7_2 Medical technologies and tools (including genetic tools and biomarkers) for prevention, diagnosis, monitoring and treatment of diseases

and in Social Sciences and Humanities,

SH1_6 Econometrics; operations research
SH4_9 Theoretical linguistics; computational linguistics

%d bloggers like this: