AABI9 tidbits [& misbits]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on December 10, 2019 by xi'an

Today’s Advances in Approximate Bayesian Inference symposium, organised by Thang Bui, Adji Bousso Dieng, Dawen Liang, Francisco Ruiz, and Cheng Zhang, took place in front of Vancouver Harbour (and the tentalising ski slope at the back) and saw more than 400 participants, drifting away from the earlier versions which had a stronger dose of ABC and much fewer participants. There were students’ talks in a fair proportion, as well (and a massive number of posters). As of below, I took some notes during some of the talks with no pretense at exhaustivity, objectivity or accuracy. (This is a blog post, remember?!) Overall I found the day exciting (to the point I did not suffer at all from the usal naps consecutive to very short nights!) and engaging, with a lot of notions and methods I had never heard about. (Which shows how much I know nothing!)

The fourth talk was by Sergey Levine, Reinforcement Learning, Optimal , Control, and Probabilistic Inference, back to Kullback-Leibler as the objective function, with linkage to optimal control (with distributions as actions?), plus again variational inference, producing an approximation in sequential settings. This sounded like a type of return of the MaxEnt prior, but the talk pace was so intense that I could not follow where the innovations stood.

The fifth talk was by Iuliia Molchanova, on Structured Semi-Implicit Variational Inference, from BAyesgroup.ru (I did not know of a Bayesian group in Russia!, as I was under the impression that Bayesian statistics were under-represented there, but apparently the situation is quite different in machine learning.) The talk brought an interesting concept of semi-implicit variational inference, exploiting some form of latent variables as far as I can understand, using mixtures of Gaussians.

The sixth talk was by Rianne van den Berg, Normalizing Flows for Discrete Data, and amounted to covering three papers also discussed in NeurIPS 2019 proper, which I found somewhat of a suboptimal approach to an invited talk, as it turned into a teaser for following talks or posters. But the teasers it contained were quite interesting as they covered normalising flows as integer valued controlled changes of variables using neural networks about which I had just became aware during the poster session, in connection with papers of Papamakarios et al., which I need to soon read.

The seventh talk was by Matthew Hoffman: Langevin Dynamics as Nonparametric Variational Inference, and sounded most interesting, both from title and later reports, as it was bridging Langevin with VI, but I alas missed it for being “stuck” in a tea-house ceremony that lasted much longer than expected. (More later on that side issue!)

After the second poster session (with a highly original proposal by Radford Neal towards creating  non-reversibility at the level of the uniform generator rather than later on), I thus only attended Emily Fox’s Stochastic Gradient MCMC for Sequential Data Sources, which superbly reviewed (in connection with a sequence of papers, including a recent one by Aicher et al.) error rate and convergence properties of stochastic gradient estimator methods there. Another paper I need to soon read!

The one before last speaker, Roman Novak, exposed a Python library about infinite neural networks, for which I had no direct connection (and talks I have always difficulties about libraries, even without a four hour sleep night) and the symposium concluded with a mild round-table. Mild because Frank Wood’s best efforts (and healthy skepticism about round tables!) to initiate controversies, we could not see much to bite from each other’s viewpoint.

cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

At MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

$\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}$

$\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)$