## inference with Wasserstein distance

**T**oday, Pierre Jacob posted on arXiv a paper of ours on the use of the Wasserstein distance in statistical inference, which main focus is exploiting this distance to create an automated measure of discrepancy for ABC. Which is why the full title is Inference in generative models using the Wasserstein distance. Generative obviously standing for the case when a model can be generated from but cannot be associated with a closed-form likelihood. We had all together discussed this notion when I visited Harvard and Pierre last March, with much excitement. (While I have not contributed much more than that round of discussions and ideas to the paper, the authors kindly included me!) The paper contains theoretical results for the consistency of statistical inference based on those distances, as well as computational on how the computation of these distances is practically feasible and on how the Hilbert space-filling curve used in sequential quasi-Monte Carlo can help. The notion further extends to dependent data via delay reconstruction and residual reconstruction techniques (as we did for some models in our empirical likelihood BCel paper). I am quite enthusiastic about this approach and look forward discussing it at the 17w5015 BIRS ABC workshop, next month!

March 28, 2017 at 2:08 pm

[…] as a proxy. With possible intricacies when the data is not iid (an issue we also met with Wasserstein distances.) In this paper the authors instead consider working on an empirical likelihood as their starting […]

January 26, 2017 at 5:03 pm

Nice paper, I also look forwards to discussing this at BIRS. A Devil’s advocate (or possibly strawman) comparison against the proposal in section 5.2 to sort points in R^D by the Hilbert space filling curve then measure the Wasserstein distance on this 1D empirical distribution: why not try the empirical KS distance? Or the KS distance after just ordering the points by a simple (non-bijective) function from R^D to R? e.g. the density from a Normal approximation (or alternatively a KDE) to draws of mock datasets from the prior, then maybe refined by repeating the process on draws from the ‘first-go’ posterior predictive?

January 29, 2017 at 4:34 pm

Hey Ewan, thanks for the good questions. There are a lot of possible distances. We have restricted ourselves to “transport” distances which take into account the metric of the underlying observation space, and to the Hilbert curve which preserves a notion of locality. I would ask, conversely: why use the empirical KS, why use another function of R^D to R?

As you can see in the supplementary materials, our choices allow for some distances between empirical distributions that satisfy specific properties (e.g. they are actual distances), leading to consistent minimum distance estimators. This would certainly be true for other choices of ordering/distances, but not all.

I’d be happy to try a few alternatives in BIRS!

January 24, 2017 at 2:18 am

Thanks for the advertisement! I will try to post something on statistfaction about it soon.