Bayesian gan [gan style]
In their paper Bayesian GANS, arXived a year ago, Saatchi and Wilson consider a Bayesian version of generative adversarial networks, putting priors on both the model and the discriminator parameters. While the prospect seems somewhat remote from genuine statistical inference, if the following statement is representative
“GANs transform white noise through a deep neural network to generate candidate samples from a data distribution. A discriminator learns, in a supervised manner, how to tune its parameters so as to correctly classify whether a given sample has come from the generator or the true data distribution. Meanwhile, the generator updates its parameters so as to fool the discriminator. As long as the generator has sufficient capacity, it can approximate the cdf inverse-cdf composition required to sample from a data distribution of interest.”
I figure the concept can also apply to a standard statistical model, where x=G(z,θ) rephrases the distributional assumption x~F(x;θ) via a white noise z. This makes resorting to a prior distribution on θ more relevant in the sense of using potential prior information on θ (although the successes of probabilistic numerics show formal priors can be used on purely numerical ground).
The “posterior distribution” that is central to the notion of Bayesian GANs is however unorthodox in that the distribution is associated with the following conditional posteriors
where D(x,θ) is the “discriminator”, that is, in GAN lingo, the probability to be allocated to the “true” data generating mechanism rather than to the one associated with G(·,θ). The generative conditional posterior (1) then aims at fooling the discriminator, i.e. favours generative parameter values that raise the probability of wrong allocation of the pseudo-data. The discriminative conditional posterior (2) is a standard Bayesian posterior based on the original sample and the generated sample. The authors then iteratively sample from these posteriors, effectively implementing a two-stage Gibbs sampler.
“By iteratively sampling from (1) and (2) at every step of an epoch one can, in the limit, obtain samples from the approximate posteriors over [both sets of parameters].”
What worries me about this approach is that just cannot work, in the sense that (1) and (2) cannot be compatible conditional (posterior) distributions. There is no joint distribution for which (1) and (2) would be the conditionals, since the pseudo-data appears in D for (1) and (1-D) in (2). This means that the convergence of a Gibbs sampler is at best to a stationary σ-finite measure. And hence that the meaning of the chain is delicate to ascertain… Am I missing any fundamental point?! [I checked the reviews on NIPS webpage and could not spot this issue being raised.]
Related
This entry was posted on June 26, 2018 at 12:18 am and is filed under Books, pictures, Statistics, University life with tags ABC, Bayesian GANs, Bayesian inference, Bayesian synthetic likelihood, compatible conditional distributions, convergence of Gibbs samplers, data generating process, Gangnam Style, GANs, likelihood-free methods, machine learning, Markov chains, mixtures of distributions, optimisation. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
2 Responses to “Bayesian gan [gan style]”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
June 27, 2018 at 8:21 am
[…] of the Gibbs sampler applied to the two “conditionals” defined in the Bayesian GANs paper discussed yesterday, I took the simplest possible example of a Normal mean generative model (one parameter) with a […]
June 27, 2018 at 5:05 am
[…] of the Gibbs sampler applied to the two “conditionals” defined in the Bayesian GANs paper discussed yesterday, I took the simplest possible example of a Normal mean generative model (one parameter) with a […]