**J**eremias Knoblauch, Jack Jewson and Theodoros Damoulas, all affiliated with Warwick (hence a potentially biased reading!), arXived a paper on loss-based Bayesian inference that Jack discussed with me on my last visit to Warwick. As I was somewhat scared by the 61 pages, of which the 8 first pages are in NeurIPS style. The authors argue for a decision-theoretic approach to Bayesian inference that involves a loss over distributions and a divergence from the prior. For instance, when using the log-score as the loss and the Kullback-Leibler divergence, the regular posterior emerges, as shown by Arnold Zellner. Variational inference also falls under this hat. The argument for this generalization is that any form of loss can be used and still returns a distribution that is used to assess uncertainty about the parameter (of interest). In the axioms they produce for justifying the derivation of the optimal procedure, including cases where the posterior is restricted to a certain class, one [Axiom 4] generalizes the likelihood principle. Given the freedom brought by this general framework, plenty of fringe Bayes methods like standard variational Bayes can be seen as solutions to such a decision problem. Others like EP do not. Of interest to me are the potentials for this formal framework to encompass misspecification and likelihood-free settings, as well as for assessing priors, which is always a fishy issue. (The authors mention in addition the capacity to build related specific design Bayesian deep networks, of which I know nothing.) The obvious reaction of mine is one of facing an abundance of wealth (!) but encompassing approximate Bayesian solutions within a Bayesian framework remains an exciting prospect.

## Archive for approximate Bayesian inference

## a generalized representation of Bayesian inference

Posted in Books with tags approximate Bayesian inference, Bayesian decision theory, Bayesian robustness, Kullback-Leibler divergence, Likelihood Principle, University of Warwick, variational inference on July 5, 2019 by xi'an## postdoc position still open

Posted in pictures, Statistics, University life with tags ABC, Agence Nationale de la Recherche, ANR, approximate Bayesian inference, bois de Boulogne, La Défense, misspecified model, Paris, Paris-Saclay campus, PhD thesis, postdoctoral position, PSL Research University, Université de Montpellier, Université Paris Dauphine, University of Oxford on May 30, 2019 by xi'an**T**he post-doctoral position supported by the ANR funding of our Paris-Saclay-Montpellier research conglomerate on approximate Bayesian inference and computation remains open for the time being. We are more particularly looking for candidates with a strong background in mathematical statistics, esp. Bayesian non-parametrics, towards the analysis of the limiting behaviour of approximate Bayesian inference. Candidates should email me (gmail address: bayesianstatistics) with a detailed vita (CV) and a motivation letter including a research plan. Letters of recommendation may also be emailed to the same address.

## did variational Bayes work?

Posted in Books, Statistics with tags approximate Bayesian inference, asymptotic Bayesian methods, ICML 2018, importance sampling, misspecified model, Pareto distribution, Pareto smoothed importance sampling, posterior predictive, variational Bayes methods, what you get is what you see on May 2, 2019 by xi'an**A**n interesting ICML 2018 paper by Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman I missed last summer on [the fairly important issue of] assessing the quality or lack thereof of a variational Bayes approximation. In the sense of being near enough from the true posterior. The criterion that they propose in this paper relates to the Pareto smoothed importance sampling technique discussed in an earlier post and which I remember discussing with Andrew when he visited CREST a few years ago. The truncation of the importance weights of prior x likelihood / VB approximation avoids infinite variance issues but induces an unknown amount of bias. The resulting diagnostic is based on the estimation of the Pareto order k. If the true value of k is less than ½, the variance of the associated Pareto distribution is finite. The paper suggests to conclude at the worth of the variational approximation when the estimate of k is less than 0.7, based on the empirical assessment of the earlier paper. The paper also contains a remark on the poor performances of the generalisation of this method to marginal settings, that is, when the importance weight is the ratio of the true and variational marginals for a sub-vector of interest. I find the counter-performances somewhat worrying in that Rao-Blackwellisation arguments make me prefer marginal ratios to joint ratios. It may however be due to a poor approximation of the marginal ratio that reflects on the approximation and not on the ratio itself. A second proposal in the paper focus on solely the point estimate returned by the variational Bayes approximation. Testing that the posterior predictive is well-calibrated. This is less appealing, especially when the authors point out the “dissadvantage is that this diagnostic does not cover the case where the observed data is not well represented by the model.” In other words, misspecified situations. This potential misspecification could presumably be tested by comparing the Pareto fit based on the actual data with a Pareto fit based on simulated data. Among other deficiencies, they point that this is “a local diagnostic that will not detect unseen modes”. In other words, *what you get is what you see*.

## asymptotics of synthetic likelihood [a reply from the authors]

Posted in Books, Statistics, University life with tags ABC, approximate Bayesian inference, Bayesian inference, Bayesian synthetic likelihood, central limit theorem, effective sample size, frequentist confidence, local regression, misspecification, pseudo-marginal MCMC, response, tolerance, uncertainty quantification on March 19, 2019 by xi'an*[Here is a reply from David, Chris, and Robert on my earlier comments, highlighting some points I had missed or misunderstood.]*

Dear Christian

Thanks for your interest in our synthetic likelihood paper and the thoughtful comments you wrote about it on your blog. We’d like to respond to the comments to avoid some misconceptions.

Your first claim is that we don’t account for the differing number of simulation draws required for each parameter proposal in ABC and synthetic likelihood. This doesn’t seem correct, see the discussion below Lemma 4 at the bottom of page 12. The comparison between methods is on the basis of effective sample size per model simulation.

As you say, in the comparison of ABC and synthetic likelihood, we consider the ABC tolerance \epsilon and the number of simulations per likelihood estimate M in synthetic likelihood as functions of n. Then for tuning parameter choices that result in the same uncertainty quantification asymptotically (and the same asymptotically as the true posterior given the summary statistic) we can look at the effective sample size per model simulation. Your objection here seems to be that even though uncertainty quantification is similar for large n, for a finite n the uncertainty quantification may differ. This is true, but similar arguments can be directed at almost any asymptotic analysis, so this doesn’t seem a serious objection to us at least. We don’t find it surprising that the strong synthetic likelihood assumptions, when accurate, give you something extra in terms of computational efficiency.

We think mixing up the synthetic likelihood/ABC comparison with the comparison between correctly specified and misspecified covariance in Bayesian synthetic likelihood is a bit unfortunate, since these situations are quite different. The first involves correct uncertainty quantification asymptotically for both methods. Only a very committed reader who looked at our paper in detail would understand what you say here. The question we are asking with the misspecified covariance is the following. If the usual Bayesian synthetic likelihood analysis is too much for our computational budget, can something still be done to quantify uncertainty? We think the answer is yes, and with the misspecified covariance we can reduce the computational requirements by an order of magnitude, but with an appropriate cost statistically speaking. The analyses with misspecified covariance give valid frequentist confidence regions asymptotically, so this may still be useful if it is all that can be done. The examples as you say show something of the nature of the trade-off involved.

We aren’t quite sure what you mean when you are puzzled about why we can avoid having M to be O(√n). Note that because of the way the summary statistics satisfy a central limit theorem, elements of the covariance matrix of S are already O(1/n), and so, for example, in estimating μ(θ) as an average of M simulations for S, the elements of the covariance matrix of the estimator of μ(θ) are O(1/(Mn)). Similar remarks apply to estimation of Σ(θ). I’m not sure whether that gets to the heart of what you are asking here or not.

In our email discussion you mention the fact that if M increases with n, then the computational burden of a single likelihood approximation and hence generating a single parameter sample also increases with n. This is true, but unavoidable if you want exact uncertainty quantification asymptotically, and M can be allowed to increase with n at any rate. With a fixed M there will be some approximation error, which is often small in practice. The situation with vanilla ABC methods will be even worse, in terms of the number of proposals required to generate a single accepted sample, in the case where exact uncertainty quantification is desired asymptotically. As shown in Li and Fearnhead (2018), if regression adjustment is used with ABC and you can find a good proposal in their sense, one can avoid this. For vanilla ABC, if the focus is on point estimation and exact uncertainty quantification is not required, the situation is better. Of course as you show in your nice ABC paper for misspecified models jointly with David Frazier and Juidth Rousseau recently the choice of whether to use regression adjustment can be subtle in the case of misspecification.

In our previous paper Price, Drovandi, Lee and Nott (2018) (which you also reviewed on this blog) we observed that if the summary statistics are exactly normal, then you can sample from the summary statistic posterior exactly with finite M in the synthetic likelihood by using pseudo-marginal ideas together with an unbiased estimate of a normal density due to Ghurye and Olkin (1962). When S satisfies a central limit theorem so that S is increasingly close to normal as n gets large, we conjecture that it is possible to get exact uncertainty quantification asymptotically with fixed M if we use the Ghurye and Olkin estimator, but we have no proof of that yet (if it is true at all).

Thanks again for being interested enough in the paper to comment, much appreciated.

David, Chris, Robert.

## absint[he] post-doc on approximate Bayesian inference in Paris, Montpellier and Oxford

Posted in Statistics with tags ABC, Agence Nationale de la Recherche, ANR, approximate Bayesian inference, bois de Boulogne, La Défense, misspecified model, Paris, Paris-Saclay campus, PhD thesis, postdoctoral position, Université de Montpellier, Université Paris Dauphine, University of Oxford on March 18, 2019 by xi'anAs a consequence of its funding by the Agence Nationale de la Recherche (ANR) in 2018, the ABSint research conglomerate is now actively recruiting a post-doctoral collaborator for up to 24 months. The accronym ** ABSint** stands for Approximate Bayesian solutions for inference on large datasets and complex models. The ABSint conglomerate involves researchers located in Paris, Saclay, Montpelliers, as well as Lyon, Marseille, Nice. This call seeks candidates with an excellent research record and who are interested to collaborate with local researchers on approximate Bayesian techniques like ABC, variational Bayes, PAC-Bayes, Bayesian non-parametrics, scalable MCMC, and related topics. A potential direction of research would be the derivation of new Bayesian tools for model checking in such complex environments. The post-doctoral collaborator will be primarily located in Université Paris-Dauphine, with supported periods in Oxford and visits to Montpellier. No teaching duty is attached to this research position.

Applications can be submitted in either English or French. Sufficient working fluency in English is required. While mastering some French does help with daily life in France (!), it is not a prerequisite. The candidate must hold a PhD degree by the date of application (not the date of employment). Position opens on July 01, with possible accommodation for a later start in September or October.

Deadline for application is April 30 or until position filled. Estimated gross salary is around 2500 EUR, depending on experience (years) since PhD. Candidates should contact Christian Robert (gmail address: bayesianstatistics) with a detailed vita (CV) and a motivation letter including a research plan. Letters of recommendation may also be emailed to the same address.

## a good start in Series B!

Posted in Books, pictures, Statistics, University life with tags ABC, approximate Bayesian inference, generative model, Journal of the Royal Statistical Society, Olympic National Park, peer review, Series B, sunrise, Wasserstein distance on January 5, 2019 by xi'an**J**ust received the great news for the turn of the year that our paper on ABC using Wasserstein distance was accepted in Series B! Inference in generative models using the Wasserstein distance, written by Espen Bernton, Pierre Jacob, Mathieu Gerber, and myself, bypasses the (nasty) selection of summary statistics in ABC by considering the Wasserstein distance between observed and simulated samples. It focuses in particular on non-iid cases like time series in what I find fairly innovative ways. I am thus very glad the paper is going to appear in JRSS B, as it has methodological consequences that should appeal to the community at large.

## calibrating approximate credible sets

Posted in Books, Statistics with tags ABC, approximate Bayesian inference, calibration, convergence diagnostics, credible intervals, exchangeability, harmonic mean estimator, simulation on October 26, 2018 by xi'an**E**arlier this week, Jeong Eun Lee, Geoff Nicholls, and Robin Ryder arXived a paper on the calibration of approximate Bayesian credible intervals. *(Warning: all three authors are good friends of mine!)* They start from the core observation that dates back to Monahan and Boos (1992) of exchangeability between θ being generated from the prior and φ being generated from the posterior associated with one observation generated from the prior predictive. (There is no name for this distribution, other than the prior, that is!) A setting amenable to ABC considerations! Actually, Prangle et al. (2014) relies on this property for assessing the ABC error, while pointing out that the test for exchangeability is not fool-proof since it works equally for two generations from the prior.

“The diagnostic tools we have described cannot be “fooled” in quite the same way checks based on the exchangeability can be.”

The paper thus proposes methods for computing the coverage [under the true posterior] of a credible set computed using an approximate posterior. (I had to fire up a few neurons to realise this was the right perspective, rather than the reverse!) A first solution to approximate the exact coverage of the approximate credible set is to use logistic regression, instead of the exact coverage, based on some summary statistics [not necessarily in an ABC framework]. And a simulation outcome that the parameter [simulated from the prior] at the source of the simulated data is within the credible set. Another approach is to use importance sampling when simulating from the pseudo-posterior. However this sounds dangerously close to resorting to an harmonic mean estimate, since the importance weight is the inverse of the approximate likelihood function. Not that anything unseemly transpires from the simulations.