Archive for lockdown

off to Bristol

Posted in Statistics with tags , , , , , , , , , , , , , on March 24, 2024 by xi'an

Asymptotics of ABC when summaries converge at heterogeneous rates

Posted in pictures, Statistics, University life with tags , , , , , , , , , on November 21, 2023 by xi'an

We just posted a new arXival, jointly with Caroline Lawless, Judith Rousseau, and Robin Ryder. This is a significant component of Caroline’s PhD thesis in Oxford, on which we started working during the first COVID lockdown.  In this paper, we extend our results with David Frazier, Gael Martin, both with whom I’ll soon be reunited!, and Judith, published in Biometrika in 2018, to the more challenging case where different components of the summary statistic vector converge to their respective means at different rates, with some possibly not even converging at all. While this sounds impossible (!), we do prove consistency of the ABC posterior under such heterogeneous rates.

Wentao Li and Paul Fearnhead (also in Biometrika and in 2018)  reduce the curse of the dimension of the set of summary statistic by showing, in the specific case of asymptotically normal summary statistics concentrating at the same rate, that a local linear post-processing step leads to a significant improvement in the theoretical behaviour of the ABC posterior. However, due to this focus on reducing the impact of the dimension of the summary statistics, it is therefore important to study its efficiency in a context where the summary statistics are not as well behaved. Surprinsingly maybe, we show that the significant improvement due to local linear post-processing persists even when summary statistics have heterogeneous behaviour.  Most interestingly, the number of summary statistics which converge at the fast rate has no impact on the rate of posterior concentration nor on the shape of the ABC posterior (provided it exceeds the dimension of the parameter).

The Effect [book review]

Posted in Books, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on March 10, 2023 by xi'an

While it sounds like the title of a science-fiction catastrophe novel or of a (of course) convoluted nouveau roman, this book by Nick Huntington-Klein is a massive initiation to econometrics and causality. As explained by the subtitle, An Introduction to Research Design and Causality.

This is a hüûüge book, actually made of two parts that could have been books (volumes?). And covering three langages, R, Stata, and Python, which should have led to three independent books. (Seriously, why print three versions when you need at best one?!)  I carried it with me during my vacations in Central Québec, but managed to loose my notes on the first part, which means missing the opportunity for biased quotes! It was mostly written during the COVID lockdown(s), which may explain for a certain amount of verbosity and rambling around.

“My mom loved the first part of the book and she is allergic to statistics.”

The first half (which is in fact a third!) is conceptual (and chatty) and almost formula free, based on the postulate that “it’s a pretty slim portion of students who understand a method because of an equation” (p.xxii). For this reader (or rather reviewer) and on explanations through example, it makes the reading much harder as spotting the main point gets harder (and requires reading most sentences!). And a very slow start since notations and mathematical notions have to be introduced with an excess of caution (as in the distinction between Latin and Greek symbols, p.36). Moving through single variable models, conditional distributions, with a lengthy explanation of how OLS are derived, data generating process and identification (of causes), causal diagrams, back and front doors (a recurrent notion within the book),  treatment effects and a conclusion chapter.

“Unlike statistical research, which is completely made of things that are at least slightly false, statistics itself is almost entirely true.” (p.327)

The second part, called the Toolbox, is closer to a classical introduction to econometrics, albeit with a shortage of mathematics (and no proof whatsoever), although [warning!] logarithms, polynomials, partial derivatives and matrices are used. Along with a consequent (3x) chunk allocated to printed codes, the density of the footnotes significantly increases in this section. It covers an extensive chapter on regression (including testing practice, non-linear and generalised linear models, as well as basic bootstrap without much warning about its use in… regression settings, and LASSO),  one on matching (with propensity scores, kernel weighting, Mahalanobis weighting, one on  simulation, yes simulation! in the sense of producing pseudo-data from known generating processes to check methods, as well as bootstrap (with resampling residuals making at last an appearance!), fixed and random effects (where the author “feels the presence of Andrew Gelman reaching through time and space to disagree”, p.405). The chapter on event studies is about time dependent data with a bit of ARIMA prediction (but nothing on non-stationary series and unit root issues). The more exotic chapters cover (18) difference-in-differences models (control vs treated groups, with John Snow pumping his way in), (19) instrumental variables (aka the minor bane of my 1980’s econometrics courses), with double least squares and generalised methods of moments (if not the simulated version), (20) discontinuity (i.e., changepoints), with the limitation of having a single variate explaining the change, rather than an unknown combination of them, and a rather pedestrian approach to the issue, (iv) other methods (including the first mention of machine learning regression/prediction and some causal forests), concluding with an “Under the rug” portmanteau.

Nothing (afaict) on multivariate regressed variates and simultaneous equations. Hardly an occurrence of Bayesian modelling (p.581), vague enough to remind me of my first course of statistics and the one-line annihilation of the notion.

Duh cover, but nice edition, except for the huge margins that could have been cut to reduce the 622 pages by a third (and harnessed the tendency of the author towards excessive footnotes!). And an unintentional white line on p.238! Cute and vaguely connected little drawings at the head of every chapter (like the head above). A rather terse matter index (except for the entry “The first reader to spot this wins ten bucks“!), which should have been completed with an acronym index.

“Calculus-heads will recognize all of this as taking integrals of the density curve. Did you know there’s calculus hidden inside statistics? The things your professor won’t tell you until it’s too late to drop the class.

Obviously I am biased in that I cannot negatively comment on an author running 5:37 a mile as, by now, I could just compete far from the 5:15 of yester decades! I am just a wee bit suspicious at the reported time, however, given that it happens exactly on page 537… (And I could have clearly taken issue with his 2014 paper, Is Robert anti-teacher? Or with the populist catering to anti-math attitudes as the above found in a footnote!) But I enjoyed reading the conceptual chapter on causality as well as the (more) technical chapter on instrumental variables (a notion I have consistently found confusing all the [long] way from graduate school). And while repeated references are made to Scott Cunningham’s Causal Inference: The Mixtape I think I will stop there with 500⁺ page introductory econometrics books!

[Disclaimer about potential self-plagiarism: this post or an edited version will potentially appear in my Books Review section in CHANCE.]

semi d’Argentan [1:36:54, 29/180, M5M 2/14, 19⁰]

Posted in pictures, Running with tags , , , , , , , , , on October 9, 2022 by xi'an

After a long break (since 2018), I ran my “traditional” half-marathon in Argentan, Normandy. As the new organisation of the race had not contacted former participants and had changed their webpage, I only heard about it a week before the race and hence had not trained at all for the distance, plus had a fairly busy September with morning classes and the like, hence was not in such a great shape compared with the end of the Summer. As a result I did not do great, with my second worst time ever (the worst being my first half in 1995, with only two weeks of preparation) and not even a first place in my Master category. Despite the number of participants having dwindled from the earlier 600 runners. Which meant a very solitary race as well, with just one runner passing me in the last 12km. And only the Norman field edges to try to escape the headwind…

the wgaf-value

Posted in Statistics with tags , , , , , , , , , , , on August 8, 2022 by xi'an


While the special health council for the French Government is closing down, the number of cases here remains quite high, seemingly in a complete indifference or worse… When 25 000 persons died from COVID since January. But political parties are all (!) against any constraining measure, with some even calling in a most demagogic manner for reintegrating [the few hundred] unvaccinated public health personnel in the public health system… (As a single datapoint, take the counter-example of our thrice-vaccinated daughter who caught COVID last week, most likely when working at the hospital.)