**A**n ICLR 2019 paper by Neklyudov, Egorov and Vetrov on an optimal choice of the proposal in an independent Metropolis algorithm I discovered via an X validated question. Namely whether or not the expected Metropolis-Hastings acceptance ratio is always one (which it is not when the support of the proposal is restricted). The paper mentions the domination of the Accept-Reject algorithm by the associated independent Metropolis-Hastings algorithm, which has actually been stated in our Monte Carlo Statistical Methods (1999, Lemma 6.3.2) and may prove even older. The authors also note that the expected acceptance probability is equal to one minus the total variation distance between the joint defined as target x Metropolis-Hastings proposal distribution and its time-reversed version. Which seems to suffer from the same difficulty as the one mentioned in the X validated question. Namely that it only holds when the support of the Metropolis-Hastings proposal is at least the support of the target (or else when the support of the joint defined as target x Metropolis-Hastings proposal distribution is somewhat symmetric. Replacing total variation with Kullback-Leibler then leads to a manageable optimisation target if the proposal is a parameterised independent distribution. With a GAN version when the proposal is not explicitly available. I find it rather strange that one still seeks independent proposals for running Metropolis-Hastings algorithms as the result will depend on the family of proposals considered and as performances will deteriorate with dimension (the authors mention a 10% acceptance rate, which sounds quite low). [As an aside, ICLR 2020 will take part in Addis Abeba next April.]

## Archive for Bayesian GANs

## an independent sampler that maximizes the acceptance rate of the MH algorithm

Posted in Books, Kids, Statistics, University life with tags accept-reject algorithm, adaptive Monte Carlo algorithm, Addis Abeba, Bayesian GANs, Ethiopia, ICLR 2019, importance sampling, Kullback-Leibler divergence, Monte Carlo Statistical Methods, optimal acceptance rate, optimisation, reversibility, simulation, total variation on September 3, 2019 by xi'an## noise contrastive estimation

Posted in Statistics with tags Bayesian GANs, CREST, doubly intractable problems, Electronic Journal of Statistics, ENSAE, Langevin MCMC algorithm, noise-contrastive estimation, Paris-Saclay campus, PhD thesis, thesis defence on July 15, 2019 by xi'an**A**s I was attending Lionel Riou-Durand’s PhD thesis defence in ENSAE-CREST last week, I had a look at his papers (!). The 2018 noise contrastive paper is written with Nicolas Chopin (both authors share the CREST affiliation with me). Which compares Charlie Geyer’s 1994 bypassing the intractable normalising constant problem by virtue of an artificial logit model with additional simulated data from another distribution *ψ*.

“Geyer (1994) established the asymptotic properties of the MC-MLE estimates under general conditions; in particular that the x’s are realisations of an ergodic process. This is remarkable, given that most of the theory on M-estimation (i.e.estimation obtained by maximising functions) is restricted to iid data.”

Michael Guttman and Aapo Hyvärinen also use additional simulated data in another likelihood of a logistic classifier, called noise contrastive estimation. Both methods replace the unknown ratio of normalising constants with an unbiased estimate based on the additional simulated data. The major and impressive result in this paper [now published in the Electronic Journal of Statistics] is that the noise contrastive estimation approach always enjoys a smaller variance than Geyer’s solution, at an equivalent computational cost when the actual data observations are iid. And the artificial data simulations ergodic. The difference between both estimators is however negligible against the Monte Carlo error (Theorem 2).

This may be a rather naïve question, but I wonder at the choice of the alternative distribution *ψ*. With a vague notion that it could be optimised in a GANs perspective. A side result of interest in the paper is to provide a minimal (re)parameterisation of the truncated multivariate Gaussian distribution, if only as an exercise for future exams. Truncated multivariate Gaussian for which the normalising constant is of course unknown.

## “more Bayesian” GANs

Posted in Books, Statistics with tags Bayesian GANs, compatible conditional distributions, GANs, MCMC convergence, pseudo-likelihood on December 21, 2018 by xi'an**O**n X validated, I got pointed to this recent paper by He, Wang, Lee and Tiang, that proposes a new form of Bayesian GAN. Although I do not see it as really Bayesian, as explained below.

“[The]existing Bayesian method (Saatchi & Wilson, 2017) may lead to incompatible conditionals, which suggest that the underlying joint distribution actually does not exist.”

## Gibbs for incompatible kids

Posted in Books, Statistics, University life with tags Bayesian GANs, convergence of Gibbs samplers, GANs, Gibbs for Kids, Gibbs sampling, irreducibility, JCGS, Markov chains, MCMC algorithms, Monte Carlo Statistical Methods, stationarity on September 27, 2018 by xi'an**I**n continuation of my earlier post on Bayesian GANs, which resort to strongly incompatible conditionals, I read a 2015 paper of Chen and Ip that I had missed. (Published in the Journal of Statistical Computation and Simulation which I first confused with JCGS and which I do not know at all. Actually, when looking at its editorial board, I recognised only one name.) But the study therein is quite disappointing and not helping as it considers Markov chains on finite state spaces, meaning that the transition distributions are matrices, meaning also that convergence is ensured if these matrices have no null probability term. And while the paper is motivated by realistic situations where incompatible conditionals can reasonably appear, the paper only produces illustrations on two and three states Markov chains. Not that helpful, in the end… The game is still afoot!

## Bayesian GANs [#2]

Posted in Books, pictures, R, Statistics with tags ABC in Edinburgh, Bayesian GANs, compatible conditional distributions, Edinburgh, GANs, generative adversarial networks, ISBA 2018, joint posterior, MCMC convergence, Metropolis-within-Gibbs algorithm, Monte Carlo Statistical Methods, normal model, University of Edinburgh on June 27, 2018 by xi'an**A**s an illustration of the lack of convergence of the Gibbs sampler applied to the two “conditionals” defined in the Bayesian GANs paper discussed yesterday, I took the simplest possible example of a Normal mean generative model (one parameter) with a logistic discriminator (one parameter) and implemented the scheme (during an ISBA 2018 session). With flat priors on both parameters. And a Normal random walk as Metropolis-Hastings proposal. As expected, since there is no stationary distribution associated with the Markov chain, simulated chains do not exhibit a stationary pattern,

And they eventually reach an overflow error or a trapping state as the log-likelihood gets approximately to zero (red curve).

Too bad I missed the talk by Shakir Mohammed yesterday, being stuck on the Edinburgh by-pass at rush hour!, as I would have loved to hear his views about this rather essential issue…