## MCMC, variational inference, invertible flows… bridging the gap?

Posted in Books, Mountains, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on October 2, 2020 by xi'an

Two weeks ago, my friend [see here when climbing Pic du Midi d’Ossau in 2005!] and coauthor Éric Moulines gave a very interesting on-line talk entitled MCMC, Variational Inference, Invertible Flows… Bridging the gap?, which was merging MCMC, variational autoencoders, and variational inference. I paid close attention as I plan to teach an advanced course on acronyms next semester in Warwick. (By acronyms, I mean ABC+GAN+VAE!)

The notion in this work is that variational autoencoders are based on over-simple mean-field variational distributions, that usually produce a poor approximation of the target distribution. Éric and his coauthors propose to introduce a Metropolis step in the VAE. This leads to a more general notion of Markov transitions and a global balance condition. Hamiltonian Monte Carlo can be used as well and it improves the latent distribution approximation, namely the encoder, which is surprising to me. The steps of the Markov kernel produce a manageable transform of the initial mean field approximation, a random version of the original VAE. Manageable provided not too many MCMC steps are implemented. (Now, the flow of slides was much too fast for me to get a proper understanding of the implementation of the method, of the degree of its calibration, and of the computing cost. I need to read the associated papers.)

Once the talk was over, I went back to changing tires and tubes, as two bikes of mine had flat tires, the latest being a spectacular explosion (!) that seemingly went through the tire (although I believe the opposite happened, namely the tire got slashed and induced the tube to blow out very quickly). Blame the numerous bits of broken glass over bike paths.

## distortion estimates for approximate Bayesian inference

Posted in pictures, Statistics, University life with tags , , , , , , , , , on July 7, 2020 by xi'an

A few days ago, Hanwen Xing, Geoff Nichols and Jeong Eun Lee arXived a paper with the following title, to be presented at uai2020. Towards assessing the fit of the approximation for the actual posterior, given the available data. This covers of course ABC methods (which seems to be the primary focus of the paper) but also variational inference and synthetic likelihood versions. For a parameter of interest, the difference between exact and approximate marginal posterior distributions is see as a distortion map, D = F o G⁻¹, interpreted as in optimal transport and estimated by normalising flows. Even when the approximate distribution G is poorly estimated since D remains the cdf of G(X) when X is distributed from F. The marginal posterior approximate cdf G can be estimated by ABC or another approximate technique. The distortion function D is itself restricted to be a Beta cdf, with parameters estimated by a neural network (although based on which input is unclear to me, unless the weights in (5) are the neural weights). The assessment is based on the estimated distortion at the dataset, as a significant difference from the identity signal a poor fit for the approximation. Overall, the procedure seems implementable rather easily and while depending on calibrating choices (other than the number of layers in the neural network) a realistic version of the simulation-based diagnostic of Talts et al. (2018).

## a generalized representation of Bayesian inference

Posted in Books with tags , , , , , , on July 5, 2019 by xi'an

Jeremias Knoblauch, Jack Jewson and Theodoros Damoulas, all affiliated with Warwick (hence a potentially biased reading!), arXived a paper on loss-based Bayesian inference that Jack discussed with me on my last visit to Warwick. As I was somewhat scared by the 61 pages, of which the 8 first pages are in NeurIPS style. The authors argue for a decision-theoretic approach to Bayesian inference that involves a loss over distributions and a divergence from the prior. For instance, when using the log-score as the loss and the Kullback-Leibler divergence, the regular posterior emerges, as shown by Arnold Zellner. Variational inference also falls under this hat. The argument for this generalization is that any form of loss can be used and still returns a distribution that is used to assess uncertainty about the parameter (of interest). In the axioms they produce for justifying the derivation of the optimal procedure, including cases where the posterior is restricted to a certain class, one [Axiom 4] generalizes the likelihood principle. Given the freedom brought by this general framework, plenty of fringe Bayes methods like standard variational Bayes can be seen as solutions to such a decision problem. Others like EP do not. Of interest to me are the potentials for this formal framework to encompass misspecification and likelihood-free settings, as well as for assessing priors, which is always a fishy issue. (The authors mention in addition the capacity to build related specific design Bayesian deep networks, of which I know nothing.) The obvious reaction of mine is one of facing an abundance of wealth (!) but encompassing approximate Bayesian solutions within a Bayesian framework remains an exciting prospect.

## postdocs positions in Uppsala in computational stats for machine learning

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 22, 2017 by xi'an

Lawrence Murray sent me a call for two postdoc positions in computational statistics and machine learning. In Uppsala, Sweden. With deadline November 17. Definitely attractive for a fresh PhD! Here are some of the contemplated themes:

(1) Developing efficient Bayesian inference algorithms for large-scale latent variable models in data rich scenarios.

(2) Finding ways of systematically combining different inference techniques, such as variational inference, sequential Monte Carlo, and deep inference networks, resulting in new methodology that can reap the benefits of these different approaches.

(3) Developing efficient black-box inference algorithms specifically targeted at inference in probabilistic programs. This line of research may include implementation of the new methods in the probabilistic programming language Birch, currently under development at the department.