## sufficient statistics for machine learning

Posted in Books, Running, Statistics, Travel with tags , , , , , on April 26, 2022 by xi'an

By chance, I came across this ICML¹⁹ paper of Milan Cvitkovic and nther Koliander, Minimal Achievable Sufficient Statistic Learning on a form of sufficiency for machine learning. The paper starts with “our” standard notion of sufficiency albeit in a predictive sense, namely that Z=T(X) is sufficient for predicting Y if the conditional distribution of Y given Z is the same as the conditional distribution of Y given X. It also acknowledges that minimal sufficiency may be out of reach. However, and without pursuing this question into the depths of said paper, I am surprised that any type of sufficiency can be achieved there since the model stands outside exponential families… In accordance with the Darmois-Pitman-Koopman lemma. Obviously, this is not a sufficiency notion in the statistical sense, since there is no likelihood (albeit there are parameters involved in the deep learning network). And Y is a discrete variate, which means that

$\mathbb P(Y=1|x),\ \mathbb P(Y=2|x),\ldots$

is a sufficient “statistic” for a fixed conditional, but I am lost at how the solution proposed in the paper, could be minimal when the dimension and structure of T(x) are chosen from the start. A very different notion, for sure!

## Metropolis-Hastings via Classification [One World ABC seminar]

Posted in Statistics, University life with tags , , , , , , , , , , , , , , , on May 27, 2021 by xi'an

Today, Veronika Rockova is giving a webinar on her paper with Tetsuya Kaji Metropolis-Hastings via classification. at the One World ABC seminar, at 11.30am UK time. (Which was also presented at the Oxford Stats seminar last Feb.) Please register if not already a member of the 1W ABC mailing list.

## NCE, VAEs, GANs & even ABC…

Posted in Statistics with tags , , , , , , , , , , , , , on May 14, 2021 by xi'an

As I was preparing my (new) lectures for a PhD short course “at” Warwick (meaning on Teams!), I read a few surveys and other papers on all these acronyms. It included the massive Guttmann and Hyvärinen 2012 NCE JMLR paperGoodfellow’s NIPS 2016 tutorial on GANs, and  Kingma and Welling 2019 introduction to VAEs. Which I found a wee bit on the light side, maybe missing the fundamentals of the notion… As well as the pretty helpful 2019 survey on normalising flows by Papamakarios et al., although missing on the (statistical) density estimation side.  And also a nice (2017) survey of GANs by Shakir Mohamed and Balaji Lakshminarayanan with a somewhat statistical spirit, even though convergence issues are not again not covered. But misspecification is there. And the many connections between ABC and GANs, if definitely missing on the uncertainty aspects. While Deep Learning by Goodfellow, Bengio and Courville adresses both the normalising constant (or partition function) and GANs, it was somehow not deep enough (!) to use for the course, offering only a few pages on NCE, VAEs and GANs. (And also missing on the statistical references addressing the issue, incl. [or excl.]  Geyer, 1994.) Overall, the infinite variations offered on GANs leave me uncertain about their statistical relevance, as it is unclear how good the regularisation therein is for handling overfitting and consistent estimation. (And if I spot another decomposition of the Kullback-Leibler divergence, I may start crying…)

## sampling with neural networks [seminar]

Posted in Statistics with tags , , , , , on March 29, 2021 by xi'an

Tomorrow (30 March, 11am ET, 16 GMT, 17 CET) Grant Rotskoff will give a webinar on Sampling with neural networks: prospects and perils, with links to developments in generative modeling to sample distributions that are challenging to sample with local dynamics, and the perils of neural network driven sampling to accelerate sampling.

## Metropolis-Hastings via classification

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on February 23, 2021 by xi'an

Veronicka Rockova (from Chicago Booth) gave a talk on this theme at the Oxford Stats seminar this afternoon. Starting with a survey of ABC, synthetic likelihoods, and pseudo-marginals, to motivate her approach via GANs, learning an approximation of the likelihood from the GAN discriminator. Her explanation for the GAN type estimate was crystal clear and made me wonder at the connection with Geyer’s 1994 logistic estimator of the likelihood (a form of discriminator with a fixed generator). She also expressed the ABC approximation hence created as the actual posterior times an exponential tilt. Which she proved is of order 1/n. And that a random variant of the algorithm (where the shift is averaged) is unbiased. Most interestingly requiring no calibration and no tolerance. Except indirectly when building the discriminator. And no summary statistic. Noteworthy tension between correct shape and correct location.