Archive for misspecification

focused Bayesian prediction

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on June 3, 2020 by xi'an

In this fourth session of our One World ABC Seminar, my friend and coauthor Gael Martin, gave an after-dinner talk on focused Bayesian prediction, more in the spirit of Bissiri et al. than following a traditional ABC approach.  because along with Ruben Loaiza-Maya and [my friend and coauthor] David Frazier, they consider the possibility of a (mild?) misspecification of the model. Using thus scoring rules à la Gneiting and Raftery. Gael had in fact presented an earlier version at our workshop in Oaxaca, in November 2018. As in other solutions of that kind, difficulty in weighting the score into a distribution. Although asymptotic irrelevance, direct impact on the current predictions, at least for the early dates in the time series… Further calibration of the set of interest A. Or the focus of the prediction. As a side note the talk perfectly fits the One World likelihood-free seminar as it does not use the likelihood function!

“The very premise of this paper is that, in reality, any choice of predictive class is such that the truth is not contained therein, at which point there is no reason to presume that the expectation of any particular scoring rule will be maximized at the truth or, indeed, maximized by the same predictive distribution that maximizes a different (expected) score.”

This approach requires the proxy class to be close enough to the true data generating model. Or in the word of the authors to be plausible predictive models. And to produce the true distribution via the score as it is proper. Or the closest to the true model in the misspecified family. I thus wonder at a possible extension with a non-parametric version, the prior being thus on functionals rather than parameters, if I understand properly the meaning of Π(Pθ). (Could the score function be misspecified itself?!) Since the score is replaced with its empirical version, the implementation is  resorting to off-the-shelf MCMC. (I wonder for a few seconds if the approach could be seen as a pseudo-marginal MCMC but the estimation is always based on the same observed sample, hence does not directly fit the pseudo-marginal MCMC framework.)

[Notice: Next talk in the series is tomorrow, 11:30am GMT+1.]


Posted in Statistics with tags , , , , , , , , , on October 10, 2019 by xi'an

Some explanation for all these acronyms! I am giving a Actuarial Mathematics & Statistics (AMS) seminar at Heriot-Watt (HW) University, in Edinburgh, tomorow. But in the (new) Bayes Centre, at the University of Edinburgh, rather than on the campus of Heriot-Watt, as this is also the launching day of the Centre for Doctoral Training (CDT) on Mathematical Modelling, Analysis, & Computation (MAG) shared between Heriot-Watt, and the University of Edinburgh, funded by the EPSRC and located in the Maxwell Institute Graduate School (MIGS) in its Bayes Centre. My talk will be on ABC convergence and misspecification.

asymptotics of synthetic likelihood [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on March 19, 2019 by xi'an

[Here is a reply from David, Chris, and Robert on my earlier comments, highlighting some points I had missed or misunderstood.]

Dear Christian

Thanks for your interest in our synthetic likelihood paper and the thoughtful comments you wrote about it on your blog.  We’d like to respond to the comments to avoid some misconceptions.

Your first claim is that we don’t account for the differing number of simulation draws required for each parameter proposal in ABC and synthetic likelihood.  This doesn’t seem correct, see the discussion below Lemma 4 at the bottom of page 12.  The comparison between methods is on the basis of effective sample size per model simulation.

As you say, in the comparison of ABC and synthetic likelihood, we consider the ABC tolerance \epsilon and the number of simulations per likelihood estimate M in synthetic likelihood as functions of n.  Then for tuning parameter choices that result in the same uncertainty quantification asymptotically (and the same asymptotically as the true posterior given the summary statistic) we can look at the effective sample size per model simulation.  Your objection here seems to be that even though uncertainty quantification is similar for large n, for a finite n the uncertainty quantification may differ.  This is true, but similar arguments can be directed at almost any asymptotic analysis, so this doesn’t seem a serious objection to us at least.  We don’t find it surprising that the strong synthetic likelihood assumptions, when accurate, give you something extra in terms of computational efficiency.

We think mixing up the synthetic likelihood/ABC comparison with the comparison between correctly specified and misspecified covariance in Bayesian synthetic likelihood is a bit unfortunate, since these situations are quite different.  The first involves correct uncertainty quantification asymptotically for both methods.  Only a very committed reader who looked at our paper in detail would understand what you say here.  The question we are asking with the misspecified covariance is the following.  If the usual Bayesian synthetic likelihood analysis is too much for our computational budget, can something still be done to quantify uncertainty?  We think the answer is yes, and with the misspecified covariance we can reduce the computational requirements by an order of magnitude, but with an appropriate cost statistically speaking.  The analyses with misspecified covariance give valid frequentist confidence regions asymptotically, so this may still be useful if it is all that can be done.  The examples as you say show something of the nature of the trade-off involved.

We aren’t quite sure what you mean when you are puzzled about why we can avoid having M to be O(√n).  Note that because of the way the summary statistics satisfy a central limit theorem, elements of the covariance matrix of S are already O(1/n), and so, for example, in estimating μ(θ) as an average of M simulations for S, the elements of the covariance matrix of the estimator of μ(θ) are O(1/(Mn)).  Similar remarks apply to estimation of Σ(θ).  I’m not sure whether that gets to the heart of what you are asking here or not.

In our email discussion you mention the fact that if M increases with n, then the computational burden of a single likelihood approximation and hence generating a single parameter sample also increases with n.  This is true, but unavoidable if you want exact uncertainty quantification asymptotically, and M can be allowed to increase with n at any rate.  With a fixed M there will be some approximation error, which is often small in practice.  The situation with vanilla ABC methods will be even worse, in terms of the number of proposals required to generate a single accepted sample, in the case where exact uncertainty quantification is desired asymptotically.  As shown in Li and Fearnhead (2018), if regression adjustment is used with ABC and you can find a good proposal in their sense, one can avoid this.  For vanilla ABC, if the focus is on point estimation and exact uncertainty quantification is not required, the situation is better.  Of course as you show in your nice ABC paper for misspecified models jointly with David Frazier and Juidth Rousseau recently the choice of whether to use regression adjustment can be subtle in the case of misspecification.

In our previous paper Price, Drovandi, Lee and Nott (2018) (which you also reviewed on this blog) we observed that if the summary statistics are exactly normal, then you can sample from the summary statistic posterior exactly with finite M in the synthetic likelihood by using pseudo-marginal ideas together with an unbiased estimate of a normal density due to Ghurye and Olkin (1962).  When S satisfies a central limit theorem so that S is increasingly close to normal as n gets large, we conjecture that it is possible to get exact uncertainty quantification asymptotically with fixed M if we use the Ghurye and Olkin estimator, but we have no proof of that yet (if it is true at all).

Thanks again for being interested enough in the paper to comment, much appreciated.

David, Chris, Robert.

asymptotics of synthetic likelihood

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , on March 11, 2019 by xi'an

David Nott, Chris Drovandi and Robert Kohn just arXived a paper on a comparison between ABC and synthetic likelihood, which is both interesting and timely given that synthetic likelihood seems to be lacking behind in terms of theoretical evaluation. I am however as puzzled by the results therein as I was by the earlier paper by Price et al. on the same topic. Maybe due to the Cambodia jetlag, which is where and when I read the paper.

My puzzlement, thus, comes from the difficulty in comparing both approaches on a strictly common ground. The paper first establishes convergence and asymptotic normality for synthetic likelihood, based on the 2003 MCMC paper of Chernozukov and Hong [which I never studied in details but that appears like the MCMC reference in the econometrics literature]. The results are similar to recent ABC convergence results, unsurprisingly when assuming a CLT on the summary statistic vector. One additional dimension of the paper is to consider convergence for a misspecified covariance matrix in the synthetic likelihood [and it will come back with a revenge]. And asymptotic normality of the synthetic score function. Which is obviously unavailable in intractable models.

The first point I have difficulty with is how the computing time required for approximating mean and variance in the synthetic likelihood, by Monte Carlo means, is not accounted for in the comparison between ABC and synthetic likelihood versions. Remember that ABC only requires one (or at most two) pseudo-samples per parameter simulation. The latter requires M, which is later constrained to increase to infinity with the sample size. Simulations that are usually the costliest in the algorithms. If ABC were to use M simulated samples as well, since it already relies on a kernel, it could as well construct [at least on principle] a similar estimator of the [summary statistic] density. Or else produce M times more pairs (parameter x pseudo-sample). The authors pointed out (once this post out) that they do account for the factor M when computing the effective sample size (before Lemma 4, page 12), but I still miss why the ESS converging to N=MN/M when M goes to infinity is such a positive feature.

Another point deals with the use of multiple approximate posteriors in the comparison. Since the approximations differ, it is unclear that convergence to a given approximation is all that should matter, if the approximation is less efficient [when compared with the original and out-of-reach posterior distribution]. Especially for a finite sample size n. This chasm in the targets becomes more evident when the authors discuss the use of a constrained synthetic likelihood covariance matrix towards requiring less pseudo-samples, i.e. lower values of M, because of a smaller number of parameters to estimate. This should be balanced against the loss in concentration of the synthetic approximation, as exemplified by the realistic examples in the paper. (It is also hard to see why M could be not of order √n for Monte Carlo reasons.)

The last section in the paper is revolving around diverse issues for misspecified models, from wrong covariance matrix to wrong generating model. As we just submitted a paper on ABC for misspecified models, I will not engage into a debate on this point but find the proposed strategy that goes through an approximation of the log-likelihood surface by a Gaussian process and a derivation of the covariance matrix of the score function apparently greedy in both calibration and computing. And not so clearly validated when the generating model is misspecified.

ISBA 18 tidbits

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 2, 2018 by xi'an

Among a continuous sequence of appealing sessions at this ISBA 2018 meeting [says a member of the scientific committee!], I happened to attend two talks [with a wee bit of overlap] by Sid Chib in two consecutive sessions, because his co-author Ana Simoni (CREST) was unfortunately sick. Their work was about models defined by a collection of moment conditions, as often happens in econometrics, developed in a recent JASA paper by Chib, Shin, and Simoni (2017). With an extension about moving to defining conditional expectations by use of a functional basis. The main approach relies on exponentially tilted empirical likelihoods, which reminded me of the empirical likelihood [BCel] implementation we ran with Kerrie Mengersen and Pierre Pudlo a few years ago. As a substitute to ABC. This problematic made me wonder on how much Bayesian the estimating equation concept is, as it should somewhat involve a nonparametric prior under the moment constraints.

Note that Sid’s [talks and] papers are disconnected from ABC, as everything comes in closed form, apart from the empirical likelihood derivation, as we actually found in our own work!, but this could become a substitute model for ABC uses. For instance, identifying the parameter θ of the model by identifying equations. Would that impose too much input from the modeller? I figure I came with this notion mostly because of the emphasis on proxy models the previous day at ABC in ‘burgh! Another connected item of interest in the work is the possibility of accounting for misspecification of these moment conditions by introducing a vector of errors with a spike & slab distribution, although I am not sure this is 100% necessary without getting further into the paper(s) [blame conference pressure on my time].

Another highlight was attending a fantastic poster session Monday night on computational methods except I would have needed four more hours to get through every and all posters. This new version of ISBA has split the posters between two sites (great) and themes (not so great!), while I would have preferred more sites covering all themes over all nights, to lower the noise (still bearable this year) and to increase the possibility to check all posters of interest in a particular theme…

Mentioning as well a great talk by Dan Roy about assessing deep learning performances by what he calls non-vacuous error bounds. Namely, through PAC-Bayesian bounds. One major comment of his was about deep learning models being much more non-parametric (number of parameters rising with number of observations) than parametric models, meaning that generative adversarial constructs as the one I discussed a few days ago may face a fundamental difficulty as models are taken at face value there.

On closed-form solutions, a closed-form Bayes factor for component selection in mixture models by Fũqene, Steel and Rossell that resemble the Savage-Dickey version, without the measure theoretic difficulties. But with non-local priors. And closed-form conjugate priors for the probit regression model, using unified skew-normal priors, as exhibited by Daniele Durante. Which are product of Normal cdfs and pdfs, and which allow for closed form marginal likelihoods and marginal posteriors as well. (The approach is not exactly conjugate as the prior and the posterior are not in the same family.)

And on the final session I attended, there were two talks on scalable MCMC, one on coresets, which will require some time and effort to assimilate, by Trevor Campbell and Tamara Broderick, and another one using Poisson subsampling. By Matias Quiroz and co-authors. Which did not completely convinced me (but this was the end of a long day…)

All in all, this has been a great edition of the ISBA meetings, if quite intense due to a non-stop schedule, with a very efficient organisation that made parallel sessions manageable and poster sessions back to a reasonable scale [although I did not once manage to cross the street to the other session]. Being in unreasonably sunny Edinburgh helped a lot obviously! I am a wee bit disappointed that no one else follows my call to wear a kilt, but I had low expectations to start with… And too bad I missed the Ironman 70.3 Edinburgh by one day!