Archive for Scotland

sequential neural likelihood estimation as ABC substitute

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on May 14, 2020 by xi'an

A JMLR paper by Papamakarios, Sterratt, and Murray (Edinburgh), first presented at the AISTATS 2019 meeting, on a new form of likelihood-free inference, away from non-zero tolerance and from the distance-based versions of ABC, following earlier papers by Iain Murray and co-authors in the same spirit. Which I got pointed to during the ABC workshop in Vancouver. At the time I had no idea as to autoregressive flows meant. We were supposed to hold a reading group in Paris-Dauphine on this paper last week, unfortunately cancelled as a coronaviral precaution… Here are some notes I had prepared for the meeting that did not take place.

A simulator model is a computer program, which takes a vector of parameters θ, makes internal calls to a random number generator, and outputs a data vector x.”

Just the usual generative model then.

“A conditional neural density estimator is a parametric model q(.|φ) (such as a neural network) controlled by a set of parameters φ, which takes a pair of datapoints (u,v) and outputs a conditional probability density q(u|v,φ).”

Less usual, in that the outcome is guaranteed to be a probability density.

“For its neural density estimator, SNPE uses a Mixture Density Network, which is a feed-forward neural network that takes x as input and outputs the parameters of a Gaussian mixture over θ.”

In which theoretical sense would it improve upon classical or Bayesian density estimators? Where are the error evaluation, the optimal rates, the sensitivity to the dimension of the data? of the parameter?

“Our new method, Sequential Neural Likelihood (SNL), avoids the bias introduced by the proposal, by opting to learn a model of the likelihood instead of the posterior.”

I do not get the argument in that the final outcome (of using the approximation within an MCMC scheme) remains biased since the likelihood is not the exact likelihood. Where is the error evaluation? Note that in the associated Algorithm 1, the learning set is enlarged on each round, as in AMIS, rather than set back to the empty set ∅ on each round.

…given enough simulations, a sufficiently flexible conditional neural density estimator will eventually approximate the likelihood in the support of the proposal, regardless of the shape of the proposal. In other words, as long as we do not exclude parts of the parameter space, the way we propose parameters does not bias learning the likelihood asymptotically. Unlike when learning the posterior, no adjustment is necessary to account for our proposing strategy.”

This is a rather vague statement, with the only support being that the Monte Carlo approximation to the Kullback-Leibler divergence does converge to its actual value, i.e. a direct application of the Law of Large Numbers! But an interesting point I informally made a (long) while ago that all that matters is the estimate of the density at x⁰. Or at the value of the statistic at x⁰. The masked auto-encoder density estimator is based on a sequence of bijections with a lower-triangular Jacobian matrix, meaning the conditional density estimate is available in closed form. Which makes it sounds like a form of neurotic variational Bayes solution.

The paper also links with ABC (too costly?), other parametric approximations to the posterior (like Gaussian copulas and variational likelihood-free inference), synthetic likelihood, Gaussian processes, noise contrastive estimation… With experiments involving some of the above. But the experiments involve rather smooth models with relatively few parameters.

“A general question is whether it is preferable to learn the posterior or the likelihood (…) Learning the likelihood can often be easier than learning the posterior, and it does not depend on the choice of proposal, which makes learning easier and more robust (…) On the other hand, methods such as SNPE return a parametric model of the posterior directly, whereas a further inference step (e.g. variational inference or MCMC) is needed on top of SNL to obtain a posterior estimate”

A fair point in the conclusion. Which also mentions the curse of dimensionality (both for parameters and observations) and the possibility to work directly with summaries.

Getting back to the earlier and connected Masked autoregressive flow for density estimation paper, by Papamakarios, Pavlakou and Murray:

“Viewing an autoregressive model as a normalizing flow opens the possibility of increasing its flexibility by stacking multiple models of the same type, by having each model provide the source of randomness for the next model in the stack. The resulting stack of models is a normalizing flow that is more flexible than the original model, and that remains tractable.”

Which makes it sound like a sort of a neural network in the density space. Optimised by Kullback-Leibler minimisation to get asymptotically close to the likelihood. But a form of Bayesian indirect inference in the end, namely an MLE on a pseudo-model, using the estimated model as a proxy in Bayesian inference…

strange loyalties [book review]

Posted in Statistics with tags , , , , , , , , , , , on April 26, 2020 by xi'an

This book by William McIlvarnney is the third and last one in the Laidlaw investigation series and the most original of the three as far as I am concerned… For it is more an inner quest than a crime investigation, as the detective is seeking an explanation to the accidental death of his brother as well as the progressive deterioration of their relation, while trying to make sense of his own life and his relation to women. It is thus as far a crime novel as it is possible, although there are criminals involved. And Laidlaw cannot separate his “job” from his personal life, meaning he does investigate on his free time the death of his brother.  It is entirely written in a first-person perspective, which makes the reading harder and slower in my case. But an apt conclusion to the trilogy, rather than being pulled into finer and finer threads as other detective stories. Brilliant (like the light on Skye during the rain).

“Life was only in the living of it. How you act and what you are and what you do and how you be were the only substance. They didn’t last either. But while you were here, they made what light there was – the wick that threads the candle-grease of time. His light was out but here I felt I could almost smell the smoke still drifting from its snuffing.”

value of a chess game

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , on April 15, 2020 by xi'an

In our (internal) webinar at CEREMADE today, Miguel Oliu Barton gave a talk on the recent result his student Luc Attia and himself obtained, namely a tractable way of finding the value of a game (when minimax equals maximin), result that got recently published in PNAS:

“Stochastic games were introduced by the Nobel Memorial Prize winner Lloyd Shapley in 1953 to model dynamic interactions in which the environment changes in response to the players’ behavior. The theory of stochastic games and its applications have been studied in several scientific disciplines, including economics, operations research, evolutionary biology, and computer science. In addition, mathematical tools that were used and developed in the study of stochastic games are used by mathematicians and computer scientists in other fields. This paper contributes to the theory of stochastic games by providing a tractable formula for the value of finite competitive stochastic games. This result settles a major open problem which remained unsolved for nearly 40 years.”

While I did not see a direct consequence of this result in regular statistics, I found most interesting the comment made at one point that chess (with forced nullity after repetitions) had a value, by virtue of Zermelo’s theorem. As I had never considered the question (contrary to Shannon!). This value remains unknown.

ABC World seminar

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 4, 2020 by xi'an

With most of the World being more or less confined at home, conferences cancelled one after the other, including ABC in Grenoble!, we are launching a fortnightly webinar on approximation Bayesian computation, methods, and inference. The idea is to gather members and disseminate results and innovation during these coming weeks and months under lock-down. And hopefully after!

At this point, the interface will be Blackboard Collaborate, run from Edinburgh by Michael Gutmann, for which neither registration nor software is required. Before each talk, a guest link will be mailed to the mailing list. Please register here to join the list.

The seminar is planned on Thursdays at either 9am or more likely 11:30 am UK (+1GMT) time, as we are still debating the best schedule to reach as many populated time zones as possible!, and the first speakers are

09.04.2020 Dennis Prangle Distilling importance sampling
23.04.2020 Ivis Kerama and Richard Everitt Rare event SMC²
07.05.2020 Umberto Picchini Stratified sampling and bootstrapping for ABC

BrewDog punk sanitiser

Posted in pictures, Wines with tags , , , , , , , , , , on March 25, 2020 by xi'an