Archive for Wasserstein distance

g-and-k [or -h] distributions

Posted in Statistics with tags , , , , , , , , , on July 17, 2017 by xi'an

Dennis Prangle released last week an R package called gk and an associated arXived paper for running inference on the g-and-k and g-and-h quantile distributions. As should be clear from an earlier review on Karian’s and Dudewicz’s book quantile distributions, I am not particularly fond of those distributions which construction seems very artificial to me, as mostly based on the production of a closed-form quantile function. But I agree they provide a neat benchmark for ABC methods, if nothing else. However, as recently pointed out in our Wasserstein paper with Espen Bernton, Pierre Jacob and Mathieu Gerber, and explained in a post of Pierre’s on Statisfaction, the pdf can be easily constructed by numerical means, hence allows for an MCMC resolution, which is also a point made by Dennis in his paper. Using the closed-form derivation of the Normal form of the distribution [i.e., applied to Φ(x)] so that numerical derivation is not necessary.

at the Isaac Newton Institute [talks]

Posted in Statistics with tags , , , , , , , on July 7, 2017 by xi'an

Here are the slides I edited this week [from previous talks by Pierre and Epstein] for the INI Workshop on scalable inference, in connection with our recently completed and submitted paper on ABC with Wasserstein distances:

MCM 2017

Posted in Statistics with tags , , , , , , , , , , , , on July 3, 2017 by xi'an

And thus I am back in Montréal, for MCM 2017, located in HEC Montréal, on the campus of Université de Montréal, for three days. My talk is predictably about ABC, what else?!, gathering diverse threads from different talks and papers:

exciting week[s]

Posted in Mountains, pictures, Running, Statistics with tags , , , , , , , , , , , , , , on June 27, 2017 by xi'an

The past week was quite exciting, despite the heat wave that hit Paris and kept me from sleeping and running! First, I made a two-day visit to Jean-Michel Marin in Montpellier, where we discussed the potential Peer Community In Computational Statistics (PCI Comput Stats) with the people behind PCI Evol Biol at INRA, Hopefully taking shape in the coming months! And went one evening through a few vineyards in Saint Christol with Jean-Michel and Arnaud. Including a long chat with the owner of Domaine Coste Moynier. [Whose domain includes the above parcel with views of Pic Saint-Loup.] And last but not least! some work planning about approximate MCMC.

On top of this, we submitted our paper on ABC with Wasserstein distances [to be arXived in an extended version in the coming weeks], our revised paper on ABC consistency thanks to highly constructive and comments from the editorial board, which induced a much improved version in my opinion, and we received a very positive return from JCGS for our paper on weak priors for mixtures! Next week should be exciting as well, with BNP 11 taking place in downtown Paris, at École Normale!!!

automated ABC summary combination

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on March 16, 2017 by xi'an

Jonathan Harrison and Ruth Baker (Oxford University) arXived this morning a paper on the optimal combination of summaries for ABC in the sense of deriving the proper weights in an Euclidean distance involving all the available summaries. The idea is to find the weights that lead to the maximal distance between prior and posterior, in a way reminiscent of Bernardo’s (1979) maximal information principle. Plus a sparsity penalty à la Lasso. The associated algorithm is sequential in that the weights are updated at each iteration. The paper does not get into theoretical justifications but considers instead several examples with limited numbers of both parameters and summary statistics. Which may highlight the limitations of the approach in that handling (and eliminating) a large number of parameters may prove impossible this way, when compared with optimisation methods like random forests. Or summary-free distances between empirical distributions like the Wasserstein distance.

X divergence for approximate inference

Posted in Statistics with tags , , , , , , , on March 14, 2017 by xi'an

Dieng et al. arXived this morning a new version of their paper on using the Χ divergence for variational inference. The Χ divergence essentially is the expectation of the squared ratio of the target distribution over the approximation, under the approximation. It is somewhat related to Expectation Propagation (EP), which aims at the Kullback-Leibler divergence between the target distribution and the approximation, under the target. And to variational Bayes, which is the same thing just the opposite way! The authors also point a link to our [adaptive] population Monte Carlo paper of 2008. (I wonder at a possible version through Wasserstein distance.)

Some of the arguments in favour of this new version of variational Bayes approximations is that (a) the support of the approximation over-estimates the posterior support; (b) it produces over-dispersed versions; (c) it relates to a well-defined and global objective function; (d) it allows for a sandwich inequality on the model evidence; (e) the function of the [approximation] parameter to be minimised is under the approximation, rather than under the target. The latest allows for a gradient-based optimisation. While one of the applications is on a Bayesian probit model applied to the Pima Indian women dataset [and will thus make James and Nicolas cringe!], the experimental assessment shows lower error rates for this and other benchmarks. Which in my opinion does not tell so much about the original Bayesian approach.

ABC with kernelised regression

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on February 22, 2017 by xi'an

sunset from the Banff Centre, Banff, Canada, March 21, 2012The exact title of the paper by Jovana Metrovic, Dino Sejdinovic, and Yee Whye Teh is DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression. It appeared last year in the proceedings of ICML.  The idea is to build ABC summaries by way of reproducing kernel Hilbert spaces (RKHS). Regressing such embeddings to the “optimal” choice of summary statistics by kernel ridge regression. With a possibility to derive summary statistics for quantities of interest rather than for the entire parameter vector. The use of RKHS reminds me of Arthur Gretton’s approach to ABC, although I see no mention made of that work in the current paper.

In the RKHS pseudo-linear formulation, the prediction of a parameter value given a sample attached to this value looks like a ridge estimator in classical linear estimation. (I thus wonder at why one would stop at the ridge stage instead of getting the full Bayes treatment!) Things get a bit more involved in the case of parameters (and observations) of interest, as the modelling requires two RKHS, because of the conditioning on the nuisance observations. Or rather three RHKS. Since those involve a maximum mean discrepancy between probability distributions, which define in turn a sort of intrinsic norm, I also wonder at a Wasserstein version of this approach.

What I find hard to understand in the paper is how a large-dimension large-size sample can be managed by such methods with no visible loss of information and no explosion of the computing budget. The authors mention Fourier features, which never rings a bell for me, but I wonder how this operates in a general setting, i.e., outside the iid case. The examples do not seem to go into enough details for me to understand how this massive dimension reduction operates (and they remain at a moderate level in terms of numbers of parameters). I was hoping Jovana Mitrovic could present her work here at the 17w5025 workshop but she sadly could not make it to Banff for lack of funding!