Archive for approximate inference

One World ABC seminar [31.3.22]

Posted in Statistics, University life with tags , , , , , , , , , on March 16, 2022 by xi'an

The next One World ABC seminar is on Thursday 31 March, with David Warnes (from QUT) talking on Multifidelity multilevel Monte Carlo for approximate Bayesian computation It will take place at 10:30 CET (GMT+1).

Models of stochastic processes are widely used in almost all fields of science. However, data are almost always incomplete observations of reality. This leads to a great challenge for statistical inference because the likelihood function will be intractable for almost all partially observed stochastic processes. As a result, it is common to apply likelihood-free approaches that replace likelihood evaluations with realisations of the model and observation process. However, likelihood-free techniques are computationally expensive for accurate inference as they may require millions of high-fidelity, expensive stochastic simulations. To address this challenge, we develop a novel approach that combines the multilevel Monte Carlo telescoping summation, applied to a sequence of approximate Bayesian posterior targets, with a multifidelity rejection sampler that learns from low-fidelity, computationally inexpensive,
model approximations to minimise the number of high-fidelity, computationally expensive, simulations required for accurate inference. Using examples from systems biology, we demonstrate improvements of more than two orders of magnitude over standard rejection sampling techniques

One World ABC seminar [24.2.22]

Posted in Statistics, University life with tags , , , , , , , , , , on February 22, 2022 by xi'an

The next One World ABC seminar is on Thursday 24 Feb, with Rafael Izbicki talking on Likelihood-Free Frequentist Inference – Constructing Confidence Sets with Correct Conditional Coverage. It will take place at 14:30 CET (GMT+1).

Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce reliable measures of uncertainty. We present a statistical framework for LFI that unifies classical statistics with modern machine learning to: (1) efficiently construct frequentist confidence sets and hypothesis tests with finite-sample guarantees of nominal coverage (type I error control) and power; (2) provide practical diagnostics
for assessing empirical coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that estimates a test statistic, like the likelihood ratio, can be plugged into our framework to create valid confidence sets and compute diagnostics, without costly Monte Carlo samples at fixed parameter settings. In this work, we specifically study the power of two test statistics (ACORE and BFF), which, respectively, maximize versus integrate an odds function over the parameter space. Our study offers multifaceted perspectives on the challenges in LF2I. This is joint work with Niccolo Dalmasso, David Zhao and Ann B. Lee.

Bayesian restricted likelihood with insufficient statistic [slides]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , on February 9, 2022 by xi'an

A great Bayesian Analysis webinar this afternoon with well-balanced presentations by Steve MacEachern and John Lewis, and original discussions by Bertrand Clarke and Fabrizio Rugieri. Which attracted 122 participants. I particularly enjoyed Bertrand’s points that likelihoods were more general than models [made in 6 different wordings!] and that this paper was closer to the M-open perspective. I think I eventually got the reason why the approach could be seen as an ABC with ε=0, since the simulated y’s all get the right statistic, but this presentation does not bring a strong argument in favour of the restricted likelihood approach, when considering the methodological and computational effort. The discussion also made me wonder if tools like VAEs could be used towards approximating the distribution of T(y) conditional on the parameter θ. This is also an opportunity to thank my friend Michele Guindani for his hard work as Editor of Bayesian Analysis and in particular for keeping the discussion tradition thriving!

One World ABC seminar [3.2.22]

Posted in Statistics, University life with tags , , , , , , , , , , , , on February 1, 2022 by xi'an

The next One World ABC seminar is on Thursday 03 Feb, with Yixing Want talking on Posterior collapse and latent variable non-identifiability It will take place at 15:30 CET (GMT+1).

Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful  epresentations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is
not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational  autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.

the invasion of the stochastic gradients

Posted in Statistics with tags , , , , , , , , , on May 10, 2017 by xi'an

Within the same day, I spotted three submissions to arXiv involving stochastic gradient descent, that I briefly browsed on my trip back from Wales:

  1. Stochastic Gradient Descent as Approximate Bayesian inference, by Mandt, Hoffman, and Blei, where this technique is used as a type of variational Bayes method, where the minimum Kullback-Leibler distance to the true posterior can be achieved. Rephrasing the [scalable] MCMC algorithm of Welling and Teh (2011) as such an approximation.
  2. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent, by Arnak Dalalyan, which establishes a convergence of the uncorrected Langevin algorithm to the right target distribution in the sense of the Wasserstein distance. (Uncorrected in the sense that there is no Metropolis step, meaning this is a Euler approximation.) With an extension to the noisy version, when the gradient is approximated eg by subsampling. The connection with stochastic gradient descent is thus tenuous, but Arnak explains the somewhat disappointing rate of convergence as being in agreement with optimisation rates.
  3. Stein variational adaptive importance sampling, by Jun Han and Qiang Liu, which relates to our population Monte Carlo algorithm, but as a non-parametric version, using RKHS to represent the transforms of the particles at each iteration. The sampling method follows two threads of particles, one that is used to estimate the transform by a stochastic gradient update, and another one that is used for estimation purposes as in a regular population Monte Carlo approach. Deconstructing into those threads allows for conditional independence that makes convergence easier to establish. (A problem we also hit when working on the AMIS algorithm.)
%d bloggers like this: