I recently realised that there is a currently very popular trend in machine learning called GAN [for *generative adversarial networks]* that strongly connects with ABC, at least in that it relies mostly on the availability of a generative model, i.e., a probability model that can be generated as in *$=G(ϵ;θ)$*$, to draw inference about θ [or predictions]$. For instance, there was a GANs tutorial at NIPS 2016 by Ian Goodfellow and many talks on the topic at recent NIPS, the 1500 in the title referring to the citations of the GAN paper by Goodfellow et al. (2014). (The name *adversarial* comes from opposing true model to generative model in the inference. )

If you remember Jeffreys‘s famous pique about classical tests as being based on improbable events that did not happen, GAN, like ABC, is sort of the opposite in that it generates events until the one that was observed happens. More precisely, by generating pseudo-samples and switching parameters $θ$until these samples get as confused as possible between the data generating (“true”) distribution and the generative one. (In its original incarnation, GAN is indeed an optimisation scheme in $θ$.) A basic presentation of GAN is that it constructs a function *D(x,ϕ)* that represents the probability that x came from the true model *p* versus the generative model, *ϕ* being the parameter of a neural network trained to this effect, aimed at minimising in *ϕ* a two-term objective function

$loglog]$

where the first expectation is taken under the true model and the second one under the generative model.

*“The discriminator tries to best distinguish samples away from the generator. The generator tries to produce samples that are indistinguishable by the discriminator.” Edward*

One ABC perception of this technique is that the confusion rate

$log]$

is a form of distance between the data and the generative model. Which expectation can be approximated by repeated simulations from this generative model. Which suggests an extension from the optimisation approach to a ABCyesian version by selecting the smallest distances across a range of *θ*‘s simulated from the prior.

This notion relates to solution using classification tools as density ratio estimation, connecting for instance to Gutmann and Hyvärinen (2012). And ultimately with Geyer’s 1992 normalising constant estimator.

Another link between ABC and networks also came out during that trip. Proposed by Bishop (1994), mixture density networks (MDN) are mixture representations of the posterior [with component parameters functions of the data] trained on the prior predictive through a neural network. These MDNs can be trained on the ABC learning table [based on a specific if redundant choice of summary statistics] and used as substitutes to the posterior distribution, which brings an interesting alternative to Simon Wood’s synthetic likelihood. In a paper I missed Papamakarios and Murray suggest replacing regular ABC with this version…