**T**his week, I gave a short and introductory course in Warwick for the CDT (PhD) students on my perceived connections between reverse logistic regression à la Geyer and GANS, among other things. The first attempt was cancelled in 2020 due to the pandemic, the second one in 2021 was on-line and thus offered little possibilities for interactions. Preparing for this third attempt made me read more papers on some statistical analyses of GANs and WGANs, which was more satisfactory [for me] even though I could not get into the technical details…

## Archive for variational autoencoders

## posterior collapse

Posted in Statistics with tags ABC, identifiability, neural network, NeurIPS 2021, One World ABC Seminar, variational approximations, variational autoencoders on February 24, 2022 by xi'an**T**he latest ABC One World webinar was a talk by Yixin Wang about the posterior collapse of auto-encoders, of which I was completely unaware. It is essentially an *identifiability* issue with auto-encoders, where the latent variable z at the source of the VAE does not impact the likelihood, assumed to be an exponential family with parameter depending on z and on θ, through possibly a neural network construct. The *variational* part comes from the parameter being estimated as θ⁰, via a variational approximation.

*“….the problem of posterior collapse mainly arises from the model and the data, rather than from inference or optimization…”*

The collapse means that the posterior for the latent satisfies p(z|θ⁰,x)=p(z), which is not a standard property since θ⁰=θ⁰(x). Which Yixin Wang, David Blei and John Cunningham show is equivalent to p(x|θ⁰,z)=p(x|θ⁰), i.e. z being unidentifiable. The above quote is then both correct and incorrect in that the choice of the inference approach, i.e. of the estimator θ⁰=θ⁰(x) has an impact on whether or not p(z|θ⁰,x)=p(z) holds. As acknowledged by the authors when describing “*methods modify the optimization objectives or algorithms of VAE to avoid parameter values θ at which the latent variable is non-identifiable*“. They later build a resolution for identifiable VAEs by imposing that the conditional p(x|θ,z) is injective in z for all values of θ. Resulting in a neural network with Brenier maps.

From a Bayesian perspective, I have difficulties to connect to the issue, the folk lore being that selecting a proper prior is a sufficient fix for avoiding non-identifiability, but more fundamentally I wonder at the relevance of inferring about the latent z’s and hence worrying about their identifiability or lack thereof.

## One World ABC seminar [3.2.22]

Posted in Statistics, University life with tags ABC, Approximate Bayesian computation, approximate inference, Brenier maps, convex neural networks, identifiability, neural network, One World, One World ABC Seminar, posterior collapse, University of Warwick, variational autoencoders, webinar on February 1, 2022 by xi'an**T**he next One World ABC seminar is on Thursday 03 Feb, with Yixing Want talking on Posterior collapse and latent variable non-identifiability It will take place at 15:30 CET (GMT+1).

Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful epresentations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is

not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.

## invertible flow non equilibrium sampling (InFiNE)

Posted in Books, Statistics, University life with tags auxiliary variable, conformal Hamiltonian dynamics, energy, Gibbs sampler, Hamiltonian Monte Carlo, HMC, MCMC, multiple importance sampling, multiple mixtures, nested sampling, ODE, particle MCMC, unbiasedness, variational autoencoders on May 21, 2021 by xi'an**W**ith Achille Thin and a few other coauthors [and friends], we just arXived a paper on a new form of importance sampling, motivated by a recent paper of Rotskoff and Vanden-Eijnden (2019) on non-equilibrium importance sampling. The central ideas of this earlier paper are the introduction of conformal Hamiltonian dynamics, where a dissipative term is added to the ODE found in HMC, namely

which means that all orbits converge to fixed points that satisfy ∇U(q) = 0 as the energy eventually vanishes. And the property that, were T be a conformal Hamiltonian integrator associated with H, i.e. perserving the invariant measure, averaging over orbits of T would improve the precision of Monte Carlo unbiased estimators, while remaining unbiased. The fact that Rotskoff and Vanden-Eijnden (2019) considered only continuous time makes their proposal hard to implement without adding approximation error, while our approach is directly set in discrete-time and preserves unbiasedness. And since measure preserving transforms are too difficult to come by, a change of variable correction, as in normalising flows, allows for an arbitrary choice of T, while keeping the estimator unbiased. The use of conformal maps makes for a natural choice of T in this context.

The resulting InFiNE algorithm is an MCMC particular algorithm which can be represented as a partially collapsed Gibbs sampler when using the right auxiliary variables. As in Andrieu, Doucet and Hollenstein (2010) and their ISIR algorithm. The algorithm can be used for estimating normalising constants, comparing favourably with AIS, sampling from complex targets, and optimising variational autoencoders and their ELBO.

I really appreciated working on this project, with links to earlier notions like multiple importance sampling à la Owen and Zhou (2000), nested sampling, non-homogeneous normalising flows, measure estimation à la Kong et al. (2002), on which I worked in a more or less distant past.