Archive for Wasserstein distance
PR[AI]RIE colloquium [12/12]
Posted in Statistics with tags colloquium, optimal transport, Paris, Paris Artificial Intelligence Research Institute, Schrödinger bridge, Sinkhorn algorithm, The Prairie Chair, Wasserstein distance on December 2, 2022 by xi'ansampling, transport, and diffusions
Posted in pictures, Running, Statistics, Travel, University life with tags causality, delayed rejection sampling, Flatiron building, Flatiron Institute, HMC, Hyvärinnen score, Madison Square Garden, normalising flow, NYC, optimal transport, Restore, Simmons Foundation, simulation, Sinkhorn algorithm, WABC, Wasserstein distance on November 18, 2022 by xi'an
This week, I am attending a very cool workshop at the Flatiron Institute (not in the Flatiron building!, but close enough) on Sampling, Transport, and Diffusions, organised by Bob Carpenter and Michael Albergo. It is quite exciting as I do not know most participants or their work! The Flatiron Institute is a private institute focussed on fundamental science funded by the Simons Foundation (in such working conditions universities cannot compete with!).
Eric Vanden-Eijden gave an introductory lecture on using optimal transport notion to improve sampling, with a PDE/ODE approach of continuously turning a base distribution into a target (formalised by the distribution at time one). This amounts to solving a velocity solution to an KL optimisation objective whose target value is zero. Velocity parameterised as a deep neural network density estimator. Using a score function in a reverse SDE inspired by Hyvärinnen (2005), with a surprising occurrence of Stein’s unbiased estimator, there for the same reasons of getting rid of an unknown element. In a lot of environments, simulating from the target is the goal and this can be achieved by MCMC sampling by normalising flows, learning the transform / pushforward map.
At the break, Yuling Yao made a very smart remark that testing between two models could also be seen as an optimal transport, trying to figure an optimal transform from one model to the next, rather than the bland mixture model we used in our mixtestin paper. At this point I have no idea about the practical difficulty of using / inferring the parameters of this continuum but one could start from normalising flows. Because of time continuity, one would need some driving principle.
Esteban Tabak gave another interest talk on simulating from a conditional distribution, which sounds like a no-problem when the conditional density is known but a challenge when only pairs are observed. The problem is seen as a transport problem to a barycentre obtained as a distribution independent from the conditioning z and then inverting. Constructing maps through flows. Very cool, even possibly providing an answer for causality questions.
Many of the transport talks involved normalizing flows. One by [Simons Fellow] Christopher Jazynski about adding to the Hamiltonian (in HMC) an artificial flow field (Vaikuntanathan and Jarzynski, 2009) to make up for the Hamiltonian moving too fast for the simulation to keep track. Connected with Eric Vanden-Eijden’s talk in the end.
An interesting extension of delayed rejection for HMC by Chirag Modi, with a manageable correction à la Antonietta Mira. Johnatan Niles-Weed provided a nonparametric perspective on optimal transport following Hütter+Rigollet, 21 AoS. With forays into the Sinkhorn algorithm, mentioning Aude Genevay’s (Dauphine graduate) regularisation.
Michael Lindsey gave a great presentation on the estimation of the trace of a matrix by the Hutchinson estimator for sdp matrices using only matrix multiplication. Solution surprisingly relying on Gibbs sampling called thermal sampling.
And while it did not involve optimal transport, I gave a short (lightning) talk on our recent adaptive restore paper: although in retrospect a presentation of Wasserstein ABC could have been more suited to the audience.
nonparametric ABC [seminar]
Posted in pictures, Statistics, University life with tags ABC, AISTATS 2022, approximate Bayesian inference, Bayesian nonparametrics, g-and-k distributions, maximum mean discrepancy, misspecified model, One World ABC Seminar, RKHS, robustness, University of Warwick, WABC, Wasserstein distance, webinar on June 3, 2022 by xi'anPuzzle: How do you run ABC when you mistrust the model?! We somewhat considered this question in our misspecified ABC paper with David and Judith. An AISTATS 2022 paper by Harita Dellaporta (Warwick), Jeremias Knoblauch, Theodoros Damoulas (Warwick), and François-Xavier Briol (formerly Warwick) is addressing this same question and Harita presented the paper at the One World ABC webinar yesterday.
It is inspired from Lyddon, Walker & Holmes (2018), who place a nonparametric prior on the generating model, in top of the assumed parametric model (with an intractable likelihood). This induces a push-forward prior on the pseudo-true parameter, that is, the value that brings the parametric family the closest possible to the true distribution of the data. Here defined as a minimum distance parameter, the maximum mean discrepancy (MMD). Choosing RKHS framework allows for a practical implementation, resorting to simulations for posterior realisations from a Dirichlet posterior and from the parametric model, and stochastic gradient for computing the pseudo-true parameter, which may prove somewhat heavy in terms of computing cost.
The paper also containts a consistency result in an ε-contaminated setting (contamination of the assumed parametric family). Comparisons like the above with a fully parametric Wasserstein-ABC approach show that this alter resists better misspecification, as could be expected since the later is not constructed for that purpose.
Next talk is on 23 June by Cosma Shalizi.
Concentration and robustness of discrepancy-based ABC [One World ABC ‘minar, 28 April]
Posted in Statistics, University life with tags ABC, Approximate Bayesian computation, approximate Bayesian inference, discrepancy, Japan, One World ABC Seminar, RIKEN, sufficiency, Tokyo, University of Warwick, Wasserstein distance, webinar on April 15, 2022 by xi'anApproximate Bayesian Computation (ABC) typically employs summary statistics to measure the discrepancy among the observed data and the synthetic data generated from each proposed value of the parameter of interest. However, finding good summary statistics (that are close to sufficiency) is non-trivial for most of the models for which ABC is needed. In this paper, we investigate the properties of ABC based on integral probability semi-metrics, including MMD and Wasserstein distances. We exhibit conditions ensuring the contraction of the approximate posterior. Moreover, we prove that MMD with an adequate kernel leads to very strong robustness properties.