Archive for ABC

sequential neural likelihood estimation as ABC substitute

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on May 14, 2020 by xi'an

A JMLR paper by Papamakarios, Sterratt, and Murray (Edinburgh), first presented at the AISTATS 2019 meeting, on a new form of likelihood-free inference, away from non-zero tolerance and from the distance-based versions of ABC, following earlier papers by Iain Murray and co-authors in the same spirit. Which I got pointed to during the ABC workshop in Vancouver. At the time I had no idea as to autoregressive flows meant. We were supposed to hold a reading group in Paris-Dauphine on this paper last week, unfortunately cancelled as a coronaviral precaution… Here are some notes I had prepared for the meeting that did not take place.

A simulator model is a computer program, which takes a vector of parameters θ, makes internal calls to a random number generator, and outputs a data vector x.”

Just the usual generative model then.

“A conditional neural density estimator is a parametric model q(.|φ) (such as a neural network) controlled by a set of parameters φ, which takes a pair of datapoints (u,v) and outputs a conditional probability density q(u|v,φ).”

Less usual, in that the outcome is guaranteed to be a probability density.

“For its neural density estimator, SNPE uses a Mixture Density Network, which is a feed-forward neural network that takes x as input and outputs the parameters of a Gaussian mixture over θ.”

In which theoretical sense would it improve upon classical or Bayesian density estimators? Where are the error evaluation, the optimal rates, the sensitivity to the dimension of the data? of the parameter?

“Our new method, Sequential Neural Likelihood (SNL), avoids the bias introduced by the proposal, by opting to learn a model of the likelihood instead of the posterior.”

I do not get the argument in that the final outcome (of using the approximation within an MCMC scheme) remains biased since the likelihood is not the exact likelihood. Where is the error evaluation? Note that in the associated Algorithm 1, the learning set is enlarged on each round, as in AMIS, rather than set back to the empty set ∅ on each round.

…given enough simulations, a sufficiently flexible conditional neural density estimator will eventually approximate the likelihood in the support of the proposal, regardless of the shape of the proposal. In other words, as long as we do not exclude parts of the parameter space, the way we propose parameters does not bias learning the likelihood asymptotically. Unlike when learning the posterior, no adjustment is necessary to account for our proposing strategy.”

This is a rather vague statement, with the only support being that the Monte Carlo approximation to the Kullback-Leibler divergence does converge to its actual value, i.e. a direct application of the Law of Large Numbers! But an interesting point I informally made a (long) while ago that all that matters is the estimate of the density at x⁰. Or at the value of the statistic at x⁰. The masked auto-encoder density estimator is based on a sequence of bijections with a lower-triangular Jacobian matrix, meaning the conditional density estimate is available in closed form. Which makes it sounds like a form of neurotic variational Bayes solution.

The paper also links with ABC (too costly?), other parametric approximations to the posterior (like Gaussian copulas and variational likelihood-free inference), synthetic likelihood, Gaussian processes, noise contrastive estimation… With experiments involving some of the above. But the experiments involve rather smooth models with relatively few parameters.

“A general question is whether it is preferable to learn the posterior or the likelihood (…) Learning the likelihood can often be easier than learning the posterior, and it does not depend on the choice of proposal, which makes learning easier and more robust (…) On the other hand, methods such as SNPE return a parametric model of the posterior directly, whereas a further inference step (e.g. variational inference or MCMC) is needed on top of SNL to obtain a posterior estimate”

A fair point in the conclusion. Which also mentions the curse of dimensionality (both for parameters and observations) and the possibility to work directly with summaries.

Getting back to the earlier and connected Masked autoregressive flow for density estimation paper, by Papamakarios, Pavlakou and Murray:

“Viewing an autoregressive model as a normalizing flow opens the possibility of increasing its flexibility by stacking multiple models of the same type, by having each model provide the source of randomness for the next model in the stack. The resulting stack of models is a normalizing flow that is more flexible than the original model, and that remains tractable.”

Which makes it sound like a sort of a neural network in the density space. Optimised by Kullback-Leibler minimisation to get asymptotically close to the likelihood. But a form of Bayesian indirect inference in the end, namely an MLE on a pseudo-model, using the estimated model as a proxy in Bayesian inference…

Laplace’s Demon [coming home!]

Posted in Kids, Linux, pictures, Statistics, University life with tags , , , , , , , , , , , , , on May 11, 2020 by xi'an

A new online seminar is starting this week, called Laplace’s Demon [after too much immersion in His Dark Materials, lately, ather than Unix coding, I first wrote daemon!] and concerned with Bayesian Machine Learning at Scale. Run by Criteo in Paris (hence the Laplace filiation, I presume!). Here is the motivational blurb from their webpage

Machine learning is changing the world we live in at a break neck pace. From image recognition and generation, to the deployment of recommender systems, it seems to be breaking new ground constantly and influencing almost every aspect of our lives. In this seminar series we ask distinguished speakers to comment on what role Bayesian statistics and Bayesian machine learning have in this rapidly changing landscape. Do we need to optimally process information or borrow strength in the big data era? Are philosophical concepts such as coherence and the likelihood principle relevant when you are running a large scale recommender system? Are variational approximations, MCMC or EP appropriate in a production environment? Can I use the propensity score and call myself a Bayesian? How can I elicit a prior over a massive dataset? Is Bayes a reasonable theory of how to be perfect but a hopeless theory of how to be good? Do we need Bayes when we can just A/B test? What combinations of pragmatism and idealism can be used to deploy Bayesian machine learning in a large scale live system? We ask Bayesian believers, Bayesian pragmatists and Bayesian skeptics to comment on all of these subjects and more.

The seminar takes places on the second Wednesday of the month, at 5pm (GMT+2) starting ill-fatedly with myself on ABC-Gibbs this very Wednesday (13 May 2020), followed by Aki Vehtari, John Ormerod, Nicolas Chopin, François Caron, Pierre Latouche, Victor Elvira, Sara Filippi, and Chris Oates. (I think my very first webinar was a presentation at the Deutsche Bank, New York, I gave from CREST videoconference room from 8pm till midnight after my trip was cancelled when the Twin Towers got destroyed, on 07 September 2001…)

One World webinars

Posted in Statistics with tags , , , , on April 21, 2020 by xi'an

Just a notice that our ABC World seminar has joined the “franchise” of the One World seminars

and that, on Thursday, 23 April, at 12:30 (CEST) IIvis Kerama and Richard Everitt will talk on  Rare event ABC-SMC², while, also on Thursday, at 16:00 (CEST) Michela Ottobre will talk Fast non mean-field network: uniform in time averaging in the One World Probability Seminar.

ABC webinar, first!

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on April 13, 2020 by xi'an

Screenshot_20200409_122723

The première of the ABC World Seminar last Thursday was most successful! It took place at the scheduled time, with no technical interruption and allowed 130⁺ participants from most of the World [sorry, West Coast friends!] to listen to the first speaker, Dennis Prangle,  presenting normalising flows and distilled importance sampling. And to answer questions. As I had already commented on the earlier version of his paper, I will not reproduce them here. In short, I remain uncertain, albeit not skeptical, about the notions of normalising flows and variational encoders for estimating densities, when perceived as a non-parametric estimator due to the large number of parameters it involves and wonder at the availability of convergence rates. Incidentally, I had forgotten at the remarkable link between KL distance & importance sampling variability. Adding to the to-read list Müller et al. (2018) on neural importance sampling.

Screenshot_20200409_124707

ABC World seminar

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 4, 2020 by xi'an

With most of the World being more or less confined at home, conferences cancelled one after the other, including ABC in Grenoble!, we are launching a fortnightly webinar on approximation Bayesian computation, methods, and inference. The idea is to gather members and disseminate results and innovation during these coming weeks and months under lock-down. And hopefully after!

At this point, the interface will be Blackboard Collaborate, run from Edinburgh by Michael Gutmann, for which neither registration nor software is required. Before each talk, a guest link will be mailed to the mailing list. Please register here to join the list.

The seminar is planned on Thursdays at either 9am or more likely 11:30 am UK (+1GMT) time, as we are still debating the best schedule to reach as many populated time zones as possible!, and the first speakers are

09.04.2020 Dennis Prangle Distilling importance sampling
23.04.2020 Ivis Kerama and Richard Everitt Rare event SMC²
07.05.2020 Umberto Picchini Stratified sampling and bootstrapping for ABC

misspecified [but published!]

Posted in Statistics with tags , , , , , on April 1, 2020 by xi'an

ABC in Svalbard [news #1]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on March 23, 2020 by xi'an

We [Julien and myself] are quite pleased to announce that

  • the scientific committee for the workshop has been gathered
  • the webpage for the workshop is now on-line (with a wonderful walrus picture whose author we alas cannot identify)
  • the workshop is now endorsed by both IMS and ISBA, which will handle registration (to open soon)
  • the reservation of hotel rooms will be handled by Hurtigruten Svalbard through the above webpage (this is important as we already paid deposit for a certain number of rooms)
  • we are definitely seeking both sponsors and organisers of mirror workshops in more populated locations

As an item of trivia, let me recall that Svalbard stands for the archipelago, while Spitsbergen is the name of the main island, where Longyearbyen is located. (In Icelandic, Svalbarði means cold rim or cold coast.)