Archive for causality

ABC in Lapland²

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on March 16, 2023 by xi'an

On the second day of our workshop, Aki Vehtari gave a short talk about his recent works on speed up post processing by importance sampling a simulation of an imprecise version of the likelihood until the desired precision is attained, importance corrected by Pareto smoothing¹⁵. A very interesting foray into the meaning of practical models and the hard constraints on computer precision. Grégoire Clarté (formerly a PhD student of ours at Dauphine) stayed on a similar ground of using sparse GP versions of the likelihood and post processing by VB²³ then stir and repeat!

Riccardo Corradin did model-based clustering when the nonparametric mixture kernel is missing a normalizing constant, using ABC with a Wasserstein distance and an adaptive proposal, with some flavour of ABC-Gibbs (and no issue of label switching since this is clustering). Mixtures of g&k models, yay! Tommaso Rigon reconsidered clustering via a (generalised Bayes à la Bissiri et al.) discrepancy measure rather than a true model, summing over all clusters and observations a discrepancy between said observation and said cluster. Very neat if possibly costly since involving distances to clusters or within clusters. Although she considered post-processing and Bayesian bootstrap, Judith (formerly [?] Dauphine)  acknowledged that she somewhat drifted from the theme of the workshop by considering BvM theorems for functionals of unknown functions, with a form of Laplace correction. (Enjoying Lapland so much that I though “Lap” in Judith’s talk was for Lapland rather than Laplace!!!) And applications to causality.

After the (X country skiing) break, Lorenzo Pacchiardi presented his adversarial approach to ABC, differing from Ramesh et al. (2022) by the use of scoring rule minimisation, where unbiased estimators of gradients are available, Ayush Bharti argued for involving experts in selecting the summary statistics, esp. for misspecified models, and Ulpu Remes presented a Jensen-Shanon divergence for selecting models likelihood-freely²², using a test statistic as summary statistic..

Sam Duffield made a case for generalised Bayesian inference in correcting errors in quantum computers, Joshua Bon went back to scoring rules for correcting the ABC approximation, with an importance step, while Trevor Campbell, Iuri Marocco and Hector McKimm nicely concluded the workshop with lightning-fast talks in place of the cancelled poster session. Great workshop, in my most objective opinion, with new directions!

ABC in Lapland

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on March 15, 2023 by xi'an

Greetings from Levi, Lapland! Sonia Petrone beautifully started the ABC workshop with a (the!) plenary Sunday night talk on quasi-Bayes in the spirit of both Fortini & Petrone (2020) and the more recent Fong, Holmes, and Walker (2023). The talk got me puzzled by wondering the nature of convergence, in that it happens no matter what the underlying distribution (or lack thereof) of the data is, in that, even without any exchangeability structure, the predictive is converging. The quasi stems from a connection with the historical Smith and Markov (1978) sequential update approximation for the posterior attached with mixtures of distributions. Which itself relates to both Dirichlet posterior updates and Bayesian bootstrap à la Newton & Raftery. Appropriate link when the convergence seems to stem from the sequence of predictives instead of the underlying distribution, if any, pulling Bayes by its own bootstrap…! Chris Holmes also talked the next day about this approach, esp. about a Bayesian approach to causality that does not require counterfactuals, in connection with a recent arXival of his (on my reading list).

Carlo Alberto presented both his 2014 SABC (simulated annealing) algorithm with a neat idea of reducing waste in the tempering schedule and a recent summary selection approach based on an auto-encoder function of both y and noise to reduce to sufficient statistic. A similar idea was found in Yannik Schälte’s talk (slide above). Who was returning to Richard Wiilkinson’s exact ABC¹³ with adaptive sequential generator, also linking to simulated annealing and ABC-SMC¹² to the rescue. Notion of amortized inference. Seemingly approximating data y with NN and then learn parameter by a normalising flow.

David Frazier talked on Q-posterior²³ approach, based on Fisher’s identity, for approximating score function, which first seemed to require some exponential family structure on a completed model (but does not, after discussing with David!), Jack Jewson on beta divergence priors²³ for uncertainty on likelihoods, better than KLD divergence on e-contamination situations, any impact on ABC? Masahiro Fujisawa back to outliers impact on ABC, again with e-contaminations (with me wondering at the impact of outliers on NN estimation).

In the afternoon session (due to two last minute cancellations, we skipped (or [MCMC] skied) one afternoon session, which coincided with a bright and crispy day, how convenient! ), Massi Tamborino (U of Warwick) FitzHugh-Nagumo process, with impossibilities to solve the inference problem differently, for instance Euler-Maruyama does not always work, numerical schemes are inducing a bias. Back to ABC with the hunt for a summary that get rid of the noise, as in Carlo Alberto’s work. Yuexi Wang talked about her works on adversarial ABC inspired from GANs. Another instance where noise is used as input. True data not used in training? Imke Botha discussed an improvement to ensemble Kalman inversion which, while biased, gains over both regular SMC timewise and ensemble Kalman inversion in precision, and Chaya Weerasinghe focussed on Bayesian forecasting in state space models under model misspecification, via approximate Bayesian computation, using an auxiliary model to produce summary statistics as in indirect inference.

The Effect [book review]

Posted in Books, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on March 10, 2023 by xi'an

While it sounds like the title of a science-fiction catastrophe novel or of a (of course) convoluted nouveau roman, this book by Nick Huntington-Klein is a massive initiation to econometrics and causality. As explained by the subtitle, An Introduction to Research Design and Causality.

This is a hüûüge book, actually made of two parts that could have been books (volumes?). And covering three langages, R, Stata, and Python, which should have led to three independent books. (Seriously, why print three versions when you need at best one?!)  I carried it with me during my vacations in Central Québec, but managed to loose my notes on the first part, which means missing the opportunity for biased quotes! It was mostly written during the COVID lockdown(s), which may explain for a certain amount of verbosity and rambling around.

“My mom loved the first part of the book and she is allergic to statistics.”

The first half (which is in fact a third!) is conceptual (and chatty) and almost formula free, based on the postulate that “it’s a pretty slim portion of students who understand a method because of an equation” (p.xxii). For this reader (or rather reviewer) and on explanations through example, it makes the reading much harder as spotting the main point gets harder (and requires reading most sentences!). And a very slow start since notations and mathematical notions have to be introduced with an excess of caution (as in the distinction between Latin and Greek symbols, p.36). Moving through single variable models, conditional distributions, with a lengthy explanation of how OLS are derived, data generating process and identification (of causes), causal diagrams, back and front doors (a recurrent notion within the book),  treatment effects and a conclusion chapter.

“Unlike statistical research, which is completely made of things that are at least slightly false, statistics itself is almost entirely true.” (p.327)

The second part, called the Toolbox, is closer to a classical introduction to econometrics, albeit with a shortage of mathematics (and no proof whatsoever), although [warning!] logarithms, polynomials, partial derivatives and matrices are used. Along with a consequent (3x) chunk allocated to printed codes, the density of the footnotes significantly increases in this section. It covers an extensive chapter on regression (including testing practice, non-linear and generalised linear models, as well as basic bootstrap without much warning about its use in… regression settings, and LASSO),  one on matching (with propensity scores, kernel weighting, Mahalanobis weighting, one on  simulation, yes simulation! in the sense of producing pseudo-data from known generating processes to check methods, as well as bootstrap (with resampling residuals making at last an appearance!), fixed and random effects (where the author “feels the presence of Andrew Gelman reaching through time and space to disagree”, p.405). The chapter on event studies is about time dependent data with a bit of ARIMA prediction (but nothing on non-stationary series and unit root issues). The more exotic chapters cover (18) difference-in-differences models (control vs treated groups, with John Snow pumping his way in), (19) instrumental variables (aka the minor bane of my 1980’s econometrics courses), with double least squares and generalised methods of moments (if not the simulated version), (20) discontinuity (i.e., changepoints), with the limitation of having a single variate explaining the change, rather than an unknown combination of them, and a rather pedestrian approach to the issue, (iv) other methods (including the first mention of machine learning regression/prediction and some causal forests), concluding with an “Under the rug” portmanteau.

Nothing (afaict) on multivariate regressed variates and simultaneous equations. Hardly an occurrence of Bayesian modelling (p.581), vague enough to remind me of my first course of statistics and the one-line annihilation of the notion.

Duh cover, but nice edition, except for the huge margins that could have been cut to reduce the 622 pages by a third (and harnessed the tendency of the author towards excessive footnotes!). And an unintentional white line on p.238! Cute and vaguely connected little drawings at the head of every chapter (like the head above). A rather terse matter index (except for the entry “The first reader to spot this wins ten bucks“!), which should have been completed with an acronym index.

“Calculus-heads will recognize all of this as taking integrals of the density curve. Did you know there’s calculus hidden inside statistics? The things your professor won’t tell you until it’s too late to drop the class.

Obviously I am biased in that I cannot negatively comment on an author running 5:37 a mile as, by now, I could just compete far from the 5:15 of yester decades! I am just a wee bit suspicious at the reported time, however, given that it happens exactly on page 537… (And I could have clearly taken issue with his 2014 paper, Is Robert anti-teacher? Or with the populist catering to anti-math attitudes as the above found in a footnote!) But I enjoyed reading the conceptual chapter on causality as well as the (more) technical chapter on instrumental variables (a notion I have consistently found confusing all the [long] way from graduate school). And while repeated references are made to Scott Cunningham’s Causal Inference: The Mixtape I think I will stop there with 500⁺ page introductory econometrics books!

[Disclaimer about potential self-plagiarism: this post or an edited version will potentially appear in my Books Review section in CHANCE.]

sampling, transport, and diffusions

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on November 18, 2022 by xi'an


This week, I am attending a very cool workshop at the Flatiron Institute (not in the Flatiron building!, but close enough) on Sampling, Transport, and Diffusions, organised by Bob Carpenter and Michael Albergo. It is quite exciting as I do not know most participants or their work! The Flatiron Institute is a private institute focussed on fundamental science funded by the Simons Foundation (in such working conditions universities cannot compete with!).

Eric Vanden-Eijden gave an introductory lecture on using optimal transport notion to improve sampling, with a PDE/ODE approach of continuously turning a base distribution into a target (formalised by the distribution at time one). This amounts to solving a velocity solution to an KL optimisation objective whose target value is zero. Velocity parameterised as a deep neural network density estimator. Using a score function in a reverse SDE inspired by Hyvärinnen (2005), with a surprising occurrence of Stein’s unbiased estimator, there for the same reasons of getting rid of an unknown element. In a lot of environments, simulating from the target is the goal and this can be achieved by MCMC sampling by normalising flows, learning the transform / pushforward map.

At the break, Yuling Yao made a very smart remark that testing between two models could also be seen as an optimal transport, trying to figure an optimal transform from one model to the next, rather than the bland mixture model we used in our mixtestin paper. At this point I have no idea about the practical difficulty of using / inferring the parameters of this continuum but one could start from normalising flows. Because of time continuity, one would need some driving principle.

Esteban Tabak gave another interest talk on simulating from a conditional distribution, which sounds like a no-problem when the conditional density is known but a challenge when only pairs are observed. The problem is seen as a transport problem to a barycentre obtained as a distribution independent from the conditioning z and then inverting. Constructing maps through flows. Very cool, even possibly providing an answer for causality questions.

Many of the transport talks involved normalizing flows. One by [Simons Fellow] Christopher Jazynski about adding to the Hamiltonian (in HMC) an artificial flow field  (Vaikuntanathan and Jarzynski, 2009) to make up for the Hamiltonian moving too fast for the simulation to keep track. Connected with Eric Vanden-Eijden’s talk in the end.

An interesting extension of delayed rejection for HMC by Chirag Modi, with a manageable correction à la Antonietta Mira. Johnatan Niles-Weed provided a nonparametric perspective on optimal transport following Hütter+Rigollet, 21 AoS. With forays into the Sinkhorn algorithm, mentioning Aude Genevay’s (Dauphine graduate) regularisation.

Michael Lindsey gave a great presentation on the estimation of the trace of a matrix by the Hutchinson estimator for sdp matrices using only matrix multiplication. Solution surprisingly relying on Gibbs sampling called thermal sampling.

And while it did not involve optimal transport, I gave a short (lightning) talk on our recent adaptive restore paper: although in retrospect a presentation of Wasserstein ABC could have been more suited to the audience.

health non-sense [xkcd]

Posted in Books, Kids, pictures, Statistics with tags , , , , , on June 5, 2022 by xi'an

%d bloggers like this: