**V**eronicka Rockova (from Chicago Booth) gave a talk on this theme at the Oxford Stats seminar this afternoon. Starting with a survey of ABC, synthetic likelihoods, and pseudo-marginals, to motivate her approach via GANs, learning an approximation of the likelihood from the GAN discriminator. Her explanation for the GAN type estimate was crystal clear and made me wonder at the connection with Geyer’s 1994 logistic estimator of the likelihood (a form of discriminator with a fixed generator). She also expressed the ABC approximation hence created as the actual posterior times an exponential tilt. Which she proved is of order 1/n. And that a random variant of the algorithm (where the shift is averaged) is unbiased. Most interestingly requiring no calibration and no tolerance. Except indirectly when building the discriminator. And no summary statistic. Noteworthy tension between correct shape and correct location.

## Archive for deep learning

## Metropolis-Hastings via classification

Posted in pictures, Statistics, Travel, University life with tags ABC, ABC consistency, Chicago, Chicago Booth School of Business, deep learning, discriminant analysis, GANs, logistic regression, seminar, summary statistics, synthetic likelihood, University of Oxford, webinar, winter running on February 23, 2021 by xi'an## poems that solve puzzles [book review]

Posted in Books, Kids, University life with tags ACM Turing Award, Ada Lovelace, Alan Turing, algorithms, AlphaGo, book cover, book review, CHANCE, Charles Babbage, checkers, deep learning, difference engine, Dublin, Fourier transform, John Tukey, machine learning, Oxford University Press, puzzle, University College Dublin on January 7, 2021 by xi'an **U**pon request, I received this book from Oxford University Press for review. Poems that Solve Puzzles is a nice title and its cover is quite to my linking (for once!). The author is Chris Bleakley, Head of the School of Computer Science at UCD.

*“This book is for people that know algorithms are important, but have no idea what they are.”*

These is the first sentence of the book and hence I am clearly falling outside the intended audience. When I asked OUP for a review copy, I was more thinking in terms of Robert Sedgewick’s Algorithms, whose first edition still sits on my shelves and which I read from first to last page when it appeared [and was part of my wife’s booklist]. This was (and is) indeed a fantastic book to learn how to build and optimise algorithms and I gain a lot from it (despite remaining a poor programmer!).

Back to poems, this one reads much more like an history of computer science for newbies than a deep entry into the “science of algorithms”, with imho too little on the algorithms themselves and their connections with computer languages and too much emphasis on the pomp and circumstances of computer science (like so-and-so got the ACM A.M. Turing Award in 19… and retired in 19…). Beside the antique algorithms for finding primes, approximating π, and computing the (fast) Fourier transform (incl. John Tukey), the story moves quickly to the difference engine of Charles Babbage and Ada Lovelace, then to Turing’s machine, and artificial intelligence with the first checkers codes, which already included some learning aspects. Some sections on the ENIAC, John von Neumann and Stan Ulam, with the invention of Monte Carlo methods (but no word on MCMC). A bit of complexity theory (P versus NP) and then Internet, Amazon, Google, Facebook, Netflix… Finishing with neural networks (then and now), the unavoidable AlphaGo, and the incoming cryptocurrencies and quantum computers. All this makes for pleasant (if unsurprising) reading and could possibly captivate a young reader for whom computers are more than a gaming console or a more senior reader who so far stayed wary and away of computers. But I would have enjoyed much more a low-tech discussion on the construction, validation and optimisation of algorithms, namely a much soft(ware) version, as it would have made it much more distinct from the existing offer on the history of computer science.

*[Disclaimer about potential self-plagiarism: this post or an edited version of it will eventually appear in my Books Review section in CHANCE.]*

## frontier of simulation-based inference

Posted in Books, Statistics, University life with tags ABC, Bayesian deep learning, classification, deep learning, GANs, kernel density estimator, National Academy of Science, neural network, neural networks and learning machines, PNAS, simulation-based inference, Statistics, summary statistics, Wasserstein distance on June 11, 2020 by xi'an

“This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, `The Science of Deep Learning,’ held March 13–14, 2019, at the National Academy of Sciences in Washington, DC.”

**A** paper by Kyle Cranmer, Johann Brehmer, and Gilles Louppe just appeared in PNAS on the frontier of simulation-based inference. Sounding more like a tribune than a research paper producing new input. Or at least like a review. Providing a quick introduction to simulators, inference, ABC. Stating the shortcomings of simulation-based inference as three-folded:

- costly, since required a large number of simulated samples
- loosing information through the use of insufficient summary statistics or poor non-parametric approximations of the sampling density.
- wasteful as requiring new computational efforts for new datasets, primarily for ABC as learning the likelihood function (as a function of both the parameter θ and the data x) is only done once.

And the difficulties increase with the dimension of the data. While the points made above are correct, I want to note that ideally ABC (and Bayesian inference as a whole) only depends on a single dimension observation, which is the likelihood value. Or more practically that it only depends on the distance from the observed data to the simulated data. (Possibly the Wasserstein distance between the cdfs.) And that, somewhat unrealistically, that ABC could store the reference table once for all. Point 3 can also be debated in that the effort of learning an approximation can only be amortized when exactly the same model is re-employed with new data, which is likely in industrial applications but less in scientific investigations, I would think. About point 2, the paper misses part of the ABC literature on selecting summary statistics, e.g., the culling afforded by random forests ABC, or the earlier use of the score function in Martin et al. (2019).

The paper then makes a case for using machine-, active-, and deep-learning advances to overcome those blocks. Recouping other recent publications and talks (like Dennis on One World ABC’minar!). Once again presenting machine-learning techniques such as normalizing flows as more efficient than traditional non-parametric estimators. Of which I remain unconvinced without deeper arguments [than the repeated mention of powerful machine-learning techniques] on the convergence rates of these estimators (rather than extolling the super-powers of neural nets).

“A classifier is trained using supervised learning to discriminate two sets of data, although in this case both sets come from the simulator and are generated for different parameter points θ⁰ and θ¹. The classifier output function can be converted into an approximation of the likelihood ratio between θ⁰ and θ¹ (…) learning the likelihood or posterior is an unsupervised learning problem, whereas estimating the likelihood ratio through a classifier is an example of supervised learning and often a simpler task.”

The above comment is highly connected to the approach set by Geyer in 1994 and expanded in Gutmann and Hyvärinen in 2012. Interestingly, at least from my narrow statistician viewpoint!, the discussion about using these different types of approximation to the likelihood and hence to the resulting Bayesian inference never engages into a quantification of the approximation or even broaches upon the potential for inconsistent inference unlocked by using fake likelihoods. While insisting on the information loss brought by using summary statistics.

“Can the outcome be trusted in the presence of imperfections such as limited sample size, insufficient network capacity, or inefficient optimization?”

Interestingly [the more because the paper is classified as statistics] the above shows that the statistical question is set instead in terms of numerical error(s). With proposals to address it ranging from (unrealistic) parametric bootstrap to some forms of GANs.

## Nature tidbits [the Bayesian brain]

Posted in Statistics with tags ABC, deep learning, DeepMind, desert locust, Harvard University, Human Genetics, Isaac Asimov, memristors, neural network, NeurIPS, p-values, SNPs, UCL, University College London, Vancouver on March 8, 2020 by xi'anIn the latest Nature issue, a long cover of Asimov’s contributions to science and rationality. And a five page article on the dopamine reward in the brain seen as a probability distribution, seen as distributional reinforcement learning by researchers from DeepMind, UCL, and Harvard. Going as far as “testing” for this theory with a p-value of 0.008..! Which could be as well a signal of variability between neurons to dopamine rewards (with a p-value of 10⁻¹⁴, whatever that means). Another article about deep learning about protein (3D) structure prediction. And another one about learning neural networks via specially designed devices called memristors. And yet another one on West Africa population genetics based on four individuals from the Stone to Metal age (8000 and 3000 years ago), SNPs, PCA, and admixtures. With no ABC mentioned (I no longer have access to the journal, having missed renewal time for my subscription!). And the literal plague of a locust invasion in Eastern Africa. Making me wonder anew as to why proteins could not be recovered from the swarms of locust to partly compensate for the damages. (Locusts eat their bodyweight in food every day.) And the latest news from NeurIPS about diversity and inclusion. And ethics, as in checking for responsibility and societal consequences of research papers. Reviewing the maths of a submitted paper or the reproducibility of an experiment is already challenging at times, but evaluating the biases in massive proprietary datasets or the long-term societal impact of a classification algorithm may prove beyond the realistic.

## Nature snippets

Posted in Statistics with tags Boris Johnson, Brexit, Canary Islands, China, confounders, deep learning, eugenics, Hawaii, Japan, Mauna Kea, Nature, Spain, tiger mosquitoes, tribune on October 1, 2019 by xi'an**I**n the August 1 issue of Nature I took with me to Japan, there were many entries of interest. The first pages included a tribune (“personal take on events”) by a professor of oceanography calling for a stop to the construction of the TMT telescope on the Mauna Kea mountain. While I am totally ignorant of the conditions of this construction and in particular of the possible ecological effects on a fragile altitude environment, the tribune is fairly confusing invoking mostly communitarian and religious, rather than scientific ones. And referring to Western science and Protestant missionaries as misrepresenting a principle of caution. While not seeing the contradiction in suggesting the move of the observatory to the Canary Islands, which were (also) invaded by Spanish settlers in the 13th century.

Among other news, Indonesia following regional tendencies to nationalise research by forcing foreign researchers to have their data vetted by the national research agency and to include Indonesian nationals in their projects. And, although this now sounds stale news, the worry about the buffoonesque Prime Minister of the UK. And of the eugenic tendencies of his cunning advisor… A longer article by Patrick Riley from Google on three problems with machine learning, from *splitting the data inappropriately* (biases in the data collection) to *hidden variables* (unsuspected confounders) to *mistaking the objective* (impact of the loss function used to learn the predictive function). (Were these warnings heeded in the following paper claiming that deep learning was better at predicting kidney failures?) Another paper of personal interest was reporting a successful experiment in Guangzhou, China, infecting tiger mosquitoes with a bacteria to make the wild population sterile. While tiger mosquitoes have reached the Greater Paris area, and are thus becoming a nuisance, releasing 5 million more mosquitoes per week in the wild may not sound like the desired solution but since the additional mosquitoes are overwhelmingly male, we would not feel the sting of this measure! The issue also contained a review paper on memory editing for clinical treatment of psychopathology, which is part of the 150 years of Nature anniversary collection, but that I did not read (or else I forgot!)