Archive for variational Bayes methods

computational advances in approximate Bayesian methods [at JSM]

Posted in Statistics with tags , , , , , , , on August 5, 2020 by xi'an

Another broadcast for an ABC (or rather ABM) session at JSM, organised and chaired by Robert Kohn, taking place tomorrow at 10am, ET, i.e., 2pm GMT, with variational and ABC talks:

454 * Thu, 8/6/2020, 10:00 AM – 11:50 AM Virtual
Computational Advances in Approximate Bayesian Methods — Topic Contributed Papers
Section on Bayesian Statistical Science
Organizer(s): Robert Kohn, University of New South Wales
Chair(s): Robert Kohn, University of New South Wales
10:05 AM Sparse Variational Inference: Bayesian Coresets from Scratch
Trevor Campbell, University of British Columbia
10:25 AM Fast Variational Approximation for Multivariate Factor Stochastic Volatility Model
David Gunawan, University of Wollongong; Robert Kohn, University of New South Wales; David Nott, National University of Singapore
10:45 AM High-Dimensional Copula Variational Approximation Through Transformation
Michael Smith, University of Melbourne; Ruben Loaiza-Maya, Monash University ; David Nott, National University of Singapore
11:05 AM Mini-Batch Metropolis-Hastings MCMC with Reversible SGLD Proposal
Rachel Wang, University of Sydney; Tung-Yu Wu, Stanford University; Wing Hung Wong, Stanford University
11:25 AM Weighted Approximate Bayesian Computation via Large Deviations Theory
Cecilia Viscardi, University of Florence; Michele Boreale, University of Florence; Fabio Corradi, University of Florence; Antonietta Mira, Università della Svizzera Italiana (USI)
11:45 AM Floor Discussion

distortion estimates for approximate Bayesian inference

Posted in pictures, Statistics, University life with tags , , , , , , , , , on July 7, 2020 by xi'an

A few days ago, Hanwen Xing, Geoff Nichols and Jeong Eun Lee arXived a paper with the following title, to be presented at uai2020. Towards assessing the fit of the approximation for the actual posterior, given the available data. This covers of course ABC methods (which seems to be the primary focus of the paper) but also variational inference and synthetic likelihood versions. For a parameter of interest, the difference between exact and approximate marginal posterior distributions is see as a distortion map, D = F o G⁻¹, interpreted as in optimal transport and estimated by normalising flows. Even when the approximate distribution G is poorly estimated since D remains the cdf of G(X) when X is distributed from F. The marginal posterior approximate cdf G can be estimated by ABC or another approximate technique. The distortion function D is itself restricted to be a Beta cdf, with parameters estimated by a neural network (although based on which input is unclear to me, unless the weights in (5) are the neural weights). The assessment is based on the estimated distortion at the dataset, as a significant difference from the identity signal a poor fit for the approximation. Overall, the procedure seems implementable rather easily and while depending on calibrating choices (other than the number of layers in the neural network) a realistic version of the simulation-based diagnostic of Talts et al. (2018).

sequential neural likelihood estimation as ABC substitute

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on May 14, 2020 by xi'an

A JMLR paper by Papamakarios, Sterratt, and Murray (Edinburgh), first presented at the AISTATS 2019 meeting, on a new form of likelihood-free inference, away from non-zero tolerance and from the distance-based versions of ABC, following earlier papers by Iain Murray and co-authors in the same spirit. Which I got pointed to during the ABC workshop in Vancouver. At the time I had no idea as to autoregressive flows meant. We were supposed to hold a reading group in Paris-Dauphine on this paper last week, unfortunately cancelled as a coronaviral precaution… Here are some notes I had prepared for the meeting that did not take place.

A simulator model is a computer program, which takes a vector of parameters θ, makes internal calls to a random number generator, and outputs a data vector x.”

Just the usual generative model then.

“A conditional neural density estimator is a parametric model q(.|φ) (such as a neural network) controlled by a set of parameters φ, which takes a pair of datapoints (u,v) and outputs a conditional probability density q(u|v,φ).”

Less usual, in that the outcome is guaranteed to be a probability density.

“For its neural density estimator, SNPE uses a Mixture Density Network, which is a feed-forward neural network that takes x as input and outputs the parameters of a Gaussian mixture over θ.”

In which theoretical sense would it improve upon classical or Bayesian density estimators? Where are the error evaluation, the optimal rates, the sensitivity to the dimension of the data? of the parameter?

“Our new method, Sequential Neural Likelihood (SNL), avoids the bias introduced by the proposal, by opting to learn a model of the likelihood instead of the posterior.”

I do not get the argument in that the final outcome (of using the approximation within an MCMC scheme) remains biased since the likelihood is not the exact likelihood. Where is the error evaluation? Note that in the associated Algorithm 1, the learning set is enlarged on each round, as in AMIS, rather than set back to the empty set ∅ on each round.

…given enough simulations, a sufficiently flexible conditional neural density estimator will eventually approximate the likelihood in the support of the proposal, regardless of the shape of the proposal. In other words, as long as we do not exclude parts of the parameter space, the way we propose parameters does not bias learning the likelihood asymptotically. Unlike when learning the posterior, no adjustment is necessary to account for our proposing strategy.”

This is a rather vague statement, with the only support being that the Monte Carlo approximation to the Kullback-Leibler divergence does converge to its actual value, i.e. a direct application of the Law of Large Numbers! But an interesting point I informally made a (long) while ago that all that matters is the estimate of the density at x⁰. Or at the value of the statistic at x⁰. The masked auto-encoder density estimator is based on a sequence of bijections with a lower-triangular Jacobian matrix, meaning the conditional density estimate is available in closed form. Which makes it sounds like a form of neurotic variational Bayes solution.

The paper also links with ABC (too costly?), other parametric approximations to the posterior (like Gaussian copulas and variational likelihood-free inference), synthetic likelihood, Gaussian processes, noise contrastive estimation… With experiments involving some of the above. But the experiments involve rather smooth models with relatively few parameters.

“A general question is whether it is preferable to learn the posterior or the likelihood (…) Learning the likelihood can often be easier than learning the posterior, and it does not depend on the choice of proposal, which makes learning easier and more robust (…) On the other hand, methods such as SNPE return a parametric model of the posterior directly, whereas a further inference step (e.g. variational inference or MCMC) is needed on top of SNL to obtain a posterior estimate”

A fair point in the conclusion. Which also mentions the curse of dimensionality (both for parameters and observations) and the possibility to work directly with summaries.

Getting back to the earlier and connected Masked autoregressive flow for density estimation paper, by Papamakarios, Pavlakou and Murray:

“Viewing an autoregressive model as a normalizing flow opens the possibility of increasing its flexibility by stacking multiple models of the same type, by having each model provide the source of randomness for the next model in the stack. The resulting stack of models is a normalizing flow that is more flexible than the original model, and that remains tractable.”

Which makes it sound like a sort of a neural network in the density space. Optimised by Kullback-Leibler minimisation to get asymptotically close to the likelihood. But a form of Bayesian indirect inference in the end, namely an MLE on a pseudo-model, using the estimated model as a proxy in Bayesian inference…

ISBA2020 program

Posted in Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , on January 29, 2020 by xi'an

The scheduled program for ISBA 2020 is now on-line. And full of exciting sessions, many with computational focus. With dear hopes that the nCo-2019 epidemics will have abated by then (and not solely for the sake of the conference, most obviously!). While early registration ends by 15 April, the deadline for junior travel support ends up this month. And so does the deadline for contributions.

MCMC, with common misunderstandings

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , on January 27, 2020 by xi'an

As I was asked to write a chapter on MCMC methods for an incoming Handbook of Computational Statistics and Data Science, published by Wiley, rather than cautiously declining!, I decided to recycle the answers I wrote on X validated to what I considered to be the most characteristic misunderstandings about MCMC and other computing methods, using as background the introduction produced by Wu Changye in his PhD thesis. Waiting for the opinion of the editors of the Handbook on this Q&A style. The outcome is certainly lighter than other recent surveys like the one we wrote with Peter Green, Krys Latuszinski, and Marcelo Pereyra, for Statistics and Computing, or the one with Victor Elvira, Nick Tawn, and Changye Wu.