**W**hen fishing for an illustration for this post on Google, I came upon this Bayesian methods for hackers cover, a book about which I have no clue whatsoever (!) but that mentions probabilistic programming. Which serves as a perfect (?!) introduction to the call for discussion in Bayesian Analysis of the incoming Bayesian conjugate gradient method by Jon Cockayne, Chris Oates (formerly Warwick), Ilse Ipsen and Mark Girolami (still partially Warwick!). Since indeed the paper is about probabilistic numerics à la Mark and co-authors. Surprisingly dealing with solving the deterministic equation Ax=b by Bayesian methods. The method produces a posterior distribution on the solution x⁰, given a fixed computing effort, which makes it pertain to the anytime algorithms. It also relates to an earlier 2015 paper by Christian Hennig where the posterior is on A⁻¹ rather than x⁰ (which is quite a surprising if valid approach to the problem!) The computing effort is translated here in computations of projections of random projections of Ax, which can be made compatible with conjugate gradient steps. Interestingly, the choice of the prior on x is quite important, including setting a low or high convergence rate… **Deadline is August 04!**

## Archive for probabilistic programming

## Bayesian conjugate gradients [open for discussion]

Posted in Books, pictures, Statistics, University life with tags Bayesian Analysis, Bayesian methods for hackers, discussion paper, probabilistic numerics, probabilistic programming, University of Warwick on June 25, 2019 by xi'an## Elves to the ABC rescue!

Posted in Books, Kids, Statistics with tags ABC, ELFI, Finnish Elves, gaussian process, Mauri Kunnas, probabilistic programming, software on November 7, 2018 by xi'anMarko Järvenpää, Michael Gutmann, Arijus Pleska, Aki Vehtari, and Pekka Marttinen have written a paper on Efficient Acquisition Rules for Model-Based Approximate Bayesian Computation soon to appear in Bayesian Analysis that gives me the right nudge to mention the ELFI software they have been contributing to for a while. Where the acronym stands for engine for likelihood-free inference. Written in Python, DAG based, and covering methods like the

- ABC rejection sampler
- Sequential Monte Carlo ABC sampler
- Bayesian Optimization for Likelihood-Free Inference (BOLFI) framework
- Bayesian Optimization (not likelihood-free)
- No-U-Turn-Sampler (not likelihood-free)

[Warning: I did not experiment with the software! Feel free to share.]

“…little work has focused on trying to quantify the amount of uncertainty in the estimator of the ABC posterior density under the chosen modelling assumptions. This uncertainty is due to a finite computational budget to perform the inference and could be thus also called as computational uncertainty.”

The paper is about looking at the “real” ABC distribution, that is, the one resulting from a realistic perspective of a finite number of simulations and acceptances. By acquisition, the authors mean an efficient way to propose the next value of the parameter θ, towards minimising the uncertainty in the ABC density estimate. Note that this involves a loss function that must be chosen by the analyst and then available for the minimisation program. If this sounds complicated…

“…our interest is to design the evaluations to minimise the uncertainty in a quantity that itself describes the uncertainty of the parameters of a costly simulation model.”

it indeed is and it requires modelling choices. As in Guttman and Corander (2016), which was also concerned by designing the location of the learning parameters, the modelling is based here on a Gaussian process for the discrepancy between the observed and the simulated data. Which provides an estimate of the likelihood, later used for selecting the next sampling value of θ. The final ABC sample is however produced by a GP estimation of the ABC distribution.As noted by the authors, the method may prove quite time consuming: for instance, one involved model required one minute of computation time for selecting the next evaluation location. (I had a bit of a difficulty when reading the paper as I kept hitting notions that are local to the paper but not immediately or precisely defined. As “adequation function” [p.11] or “discrepancy”. Maybe correlated with short nights while staying at CIRM for the Masterclass, always waking up around 4am for unknown reasons!)

## IMS workshop [day 3]

Posted in pictures, R, Statistics, Travel, University life with tags Bayesian computation, Birch, delayed simulation, high dimensions, hypocoercivity, IMS, Institute for Mathematical Sciences, Lapland, MCqMC 2018, National University Singapore, non-reversible diffusion, NUS, ODE, partly deterministic processes, probabilistic programming, Rao-Blackwellisation, Rennes, Singapore, Wang-Landau algorithm, workshop on August 30, 2018 by xi'an**I** made the “capital” mistake of walking across the entire NUS campus this morning, which is quite green and pretty, but which almost enjoys an additional dimension brought by such an intense humidity that one feels having to get around this humidity!, a feature I have managed to completely erase from my memory of my previous visit there. Anyway, nothing of any relevance. oNE talk in the morning was by Markus Eisenbach on tools used by physicists to speed up Monte Carlo methods, like the Wang-Landau flat histogram, towards computing the partition function, or the distribution of the energy levels, definitely addressing issues close to my interest, but somewhat beyond my reach for using a different language and stress, as often in physics. (I mean, as often in physics talks I attend.) An idea that came out clear to me was to bypass a (flat) histogram target and aim directly at a constant slope cdf for the energy levels. (But got scared away by the Fourier transforms!)

Lawrence Murray then discussed some features of the Birch probabilistic programming language he is currently developing, especially a fairly fascinating concept of delayed sampling, which connects with locally-optimal proposals and Rao Blackwellisation. Which I plan to get back to later [and hopefully sooner than later!].

In the afternoon, Maria de Iorio gave a talk about the construction of nonparametric priors that create dependence between a sequence of functions, a notion I had not thought of before, with an array of possibilities when using the stick breaking construction of Dirichlet processes.

And Christophe Andrieu gave a very smooth and helpful entry to partly deterministic Markov processes (PDMP) in preparation for talks he is giving next week for the continuation of the workshop at IMS. Starting with the guided random walk of Gustafson (1998), which extended a bit later into the non-reversible paper of Diaconis, Holmes, and Neal (2000). Although I had a vague idea of the contents of these papers, the role of the velocity **ν** became much clearer. And premonitory of the advances made by the more recent PDMP proposals. There is obviously a continuation with the equally pedagogical talk Christophe gave at MCqMC in Rennes two months [and half the globe] ago, but the focus being somewhat different, it really felt like a new talk [my short term memory may also play some role in this feeling!, as I now remember the discussion of Hilderbrand (2002) for non-reversible processes]. An introduction to the topic I would recommend to anyone interested in this new branch of Monte Carlo simulation! To be followed by the most recently arXived hypocoercivity paper by Christophe and co-authors.

## Bayesian program synthesis

Posted in Books, pictures, Statistics, University life with tags David Blei, deep learning, Gamelon, machine learning, neural network, principles of uncertainty, probabilistic programming, Science & Vie, solar system on April 7, 2017 by xi'an**L**ast week, I—along with Jean-Michel Marin—got an email from a journalist working for Science & Vie, a French sciences journal that published a few years ago a special issue on Bayes’ theorem. (With the insane title of “the formula that deciphers the World!”) The reason for this call was the preparation of a paper on Gamalon, a new AI company that relies on (Bayesian) probabilistic programming to devise predictive tools. And spent an hour skyping with him about Bayesian inference, probabilistic programming and machine-learning, at the general level since we had not heard previously of this company or of its central tool.

“the Gamalon BPS system learns from only a few examples, not millions. It can learn using a tablet processor, not hundreds of servers. It learns right away while we play with it, not over weeks or months. And it learns from just one person, not from thousands.”

Gamalon claims to do much better than deep learning at those tasks. Not that I have reasons to doubt that claim, quite the opposite, an obvious reason being that incorporating rules and probabilistic models in the predictor is going to help if these rule and models are even moderately realistic, another major one being that handling uncertainty and learning by Bayesian tools is usually a good idea (!), and yet another significant one being that David Blei is a member of their advisory committee. But it is hard to get a feeling for such claims when the only element in the open is the use of probabilistic programming, which is an advanced and efficient manner of conducting model building and updating and handling (posterior) distributions as objects, but which does not enjoy higher predictives abilities by default. Unless I live with a restricted definition of what probabilistic programming stands for! In any case, the video provided by Gamalon and the presentation given by its CEO do not help in my understanding of the principles behind this massive gain in efficiency. Which makes sense given that the company would not want to give up their edge on the competition.

Incidentally, the video in this presentation comparing the predictive abilities of the four major astronomical explanations of the solar system is great. If not particularly connected with the difference between deep learning and Bayesian probabilistic programming.

## NIPS 2014

Posted in Kids, pictures, Statistics, Travel, University life with tags ABC, Andrey Markov, Canada, compiler, Montréal, mug, NIPS 2014, phylogenetic tree, population genetics, probabilistic programming, random forests, variational Bayes methods on December 15, 2014 by xi'an**S**econd and last day of the NIPS workshops! The collection of topics was quite broad and would have made my choosing an ordeal, except that I was invited to give a talk at the probabilistic programming workshop, solving my dilemma… The first talk by Kathleen Fisher was quite enjoyable in that it gave a conceptual discussion of the motivations for probabilistic languages, drawing an analogy with the early days of computer programming that saw a separation between higher level computer languages and machine programming, with a compiler interface. And calling for a similar separation between the models faced by statistical inference and machine-learning and the corresponding code, if I understood her correctly. This was connected with Frank Wood’s talk of the previous day where he illustrated the concept through a generation of computer codes to approximately generate from standard distributions like Normal or Poisson. Approximately as in ABC, which is why the organisers invited me to talk in this session. However, I was a wee bit lost in the following talks and presumably lost part of my audience during *my* talk, as I realised later to my dismay when someone told me he had not perceived the distinction between the trees in the random forest procedure and the phylogenetic trees in the population genetic application. Still, while it had for me a sort of Twilight Zone feeling of having stepped in another dimension, attending this workshop was an worthwhile experiment as an eye-opener into a highly different albeit connected field, where code and simulator may take the place of a likelihood function… To the point of defining Hamiltonian Monte Carlo directly on the former, as Vikash Mansinghka showed me at the break.

I completed the day with the final talks in the variational inference workshop, if only to get back on firmer ground! Apart from attending my third talk by Vikash in the conference (but on a completely different topic on variational approximations for discrete particle-ar distributions), a talk by Tim Salimans linked MCMC and variational approximations, using MCMC and HMC to derive variational bounds. (He did not expand on the opposite use of variational approximations to build better proposals.) Overall, I found these two days and my first NIPS conference quite exciting, if somewhat overpowering, with a different atmosphere and a different pace compared with (small or large) statistical meetings. (And a staggering gender imbalance!)

## AISTATS 2014 [day #3]

Posted in Mountains, pictures, Statistics, Travel, University life with tags ABC, AISTATS 2014, Anglican, big data, deep learning, exponential families, Garðskaga, Gaussian processes, Iceland, machine learning, probabilistic programming, Reykjanes Peninsula, Reykjavik on April 28, 2014 by xi'an**T**he third day at AISTATS 2014 started with Michael Jordan giving his plenary lecture, or rather three short talks on “Big Data” privacy, communication risk, and (bag of) bootstrap. I had not previously heard Michael talking about the first two topics and further found interesting the attempt to put computation into the picture (a favourite notion of Michael’s), however I was a bit surprised at the choice of a minimax criterion. Indeed, getting away from the minimax criterion was one of the major reasons I move to the B side of the Force. Because it puts exactly the same importance on every single value of the parameter. Even the most impossible ones. I was also a wee bit surprised at the optimal solution produced by this criterion: in a multivariate binary data setting (e.g., multiple drugs usage), the optimal privacy solution was to create a random binary vector and pick at random between this vector and its complement, depending on which one is closest to the observable. The loss of information seems formidable if the dimension of the vector is large. (Implementing ABC as a privacy [privacizing?] strategy would sound better if less optimal…) The next session was about *deep learning*, of which I knew [and know nothing], but the talk by Yoshua Bengio raised very relevant questions, like how to learn where the main part of the mass of a probability distribution is, besides pointing at a recent survey of his’. The survey points at some notions that I master and some that I don’t, but a cursory reading does not lead me to put an intuitive meaning on deep learning.

**T**he last session of the day and of the conference was on more statistical issues, like a Gaussian process modelling of a spatio-temporal dataset on Afghanistan attacks by Guido Sanguinetti, the use of Rao-Blackwellisation and control variate to build black-box variational inference by Rajesh Ranganath, the construction of conditional exponential families on mixed graphs by Pradeep Ravikumar, and a presentation of probabilistic programming with Anglican by Frank Wood that I had already seen in Banff. In particular, I found the result on the existence of joint exponential families on graphs when defined by those full conditionals quite exciting!

**T**he second poster session was in the early evening, with many more posters (and plenty of food and drinks!), as it also included the (non-refereed) MLSS posters. Among the many interesting ones I spotted, a way to hit-and-run for quasi-concave densities, estimating mixtures with negative weights, a failing particle algorithm for a flu epidemics, an exact EP algorithm, and a fairly intense discussion around Richard Wilkinson’s poster on Gaussian process ABC algorithm (that I discussed on the ‘Og a while ago).