## asymptotics of synthetic likelihood [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on March 19, 2019 by xi'an

[Here is a reply from David, Chris, and Robert on my earlier comments, highlighting some points I had missed or misunderstood.]

Dear Christian

Thanks for your interest in our synthetic likelihood paper and the thoughtful comments you wrote about it on your blog.  We’d like to respond to the comments to avoid some misconceptions.

Your first claim is that we don’t account for the differing number of simulation draws required for each parameter proposal in ABC and synthetic likelihood.  This doesn’t seem correct, see the discussion below Lemma 4 at the bottom of page 12.  The comparison between methods is on the basis of effective sample size per model simulation.

As you say, in the comparison of ABC and synthetic likelihood, we consider the ABC tolerance \epsilon and the number of simulations per likelihood estimate M in synthetic likelihood as functions of n.  Then for tuning parameter choices that result in the same uncertainty quantification asymptotically (and the same asymptotically as the true posterior given the summary statistic) we can look at the effective sample size per model simulation.  Your objection here seems to be that even though uncertainty quantification is similar for large n, for a finite n the uncertainty quantification may differ.  This is true, but similar arguments can be directed at almost any asymptotic analysis, so this doesn’t seem a serious objection to us at least.  We don’t find it surprising that the strong synthetic likelihood assumptions, when accurate, give you something extra in terms of computational efficiency.

We think mixing up the synthetic likelihood/ABC comparison with the comparison between correctly specified and misspecified covariance in Bayesian synthetic likelihood is a bit unfortunate, since these situations are quite different.  The first involves correct uncertainty quantification asymptotically for both methods.  Only a very committed reader who looked at our paper in detail would understand what you say here.  The question we are asking with the misspecified covariance is the following.  If the usual Bayesian synthetic likelihood analysis is too much for our computational budget, can something still be done to quantify uncertainty?  We think the answer is yes, and with the misspecified covariance we can reduce the computational requirements by an order of magnitude, but with an appropriate cost statistically speaking.  The analyses with misspecified covariance give valid frequentist confidence regions asymptotically, so this may still be useful if it is all that can be done.  The examples as you say show something of the nature of the trade-off involved.

We aren’t quite sure what you mean when you are puzzled about why we can avoid having M to be O(√n).  Note that because of the way the summary statistics satisfy a central limit theorem, elements of the covariance matrix of S are already O(1/n), and so, for example, in estimating μ(θ) as an average of M simulations for S, the elements of the covariance matrix of the estimator of μ(θ) are O(1/(Mn)).  Similar remarks apply to estimation of Σ(θ).  I’m not sure whether that gets to the heart of what you are asking here or not.

In our email discussion you mention the fact that if M increases with n, then the computational burden of a single likelihood approximation and hence generating a single parameter sample also increases with n.  This is true, but unavoidable if you want exact uncertainty quantification asymptotically, and M can be allowed to increase with n at any rate.  With a fixed M there will be some approximation error, which is often small in practice.  The situation with vanilla ABC methods will be even worse, in terms of the number of proposals required to generate a single accepted sample, in the case where exact uncertainty quantification is desired asymptotically.  As shown in Li and Fearnhead (2018), if regression adjustment is used with ABC and you can find a good proposal in their sense, one can avoid this.  For vanilla ABC, if the focus is on point estimation and exact uncertainty quantification is not required, the situation is better.  Of course as you show in your nice ABC paper for misspecified models jointly with David Frazier and Juidth Rousseau recently the choice of whether to use regression adjustment can be subtle in the case of misspecification.

In our previous paper Price, Drovandi, Lee and Nott (2018) (which you also reviewed on this blog) we observed that if the summary statistics are exactly normal, then you can sample from the summary statistic posterior exactly with finite M in the synthetic likelihood by using pseudo-marginal ideas together with an unbiased estimate of a normal density due to Ghurye and Olkin (1962).  When S satisfies a central limit theorem so that S is increasingly close to normal as n gets large, we conjecture that it is possible to get exact uncertainty quantification asymptotically with fixed M if we use the Ghurye and Olkin estimator, but we have no proof of that yet (if it is true at all).

Thanks again for being interested enough in the paper to comment, much appreciated.

David, Chris, Robert.

## a response by Ly, Verhagen, and Wagenmakers

Posted in Statistics with tags , , , , , , , , on March 9, 2017 by xi'an

Following my demise [of the Bayes factor], Alexander Ly, Josine Verhagen, and Eric-Jan Wagenmakers wrote a very detailed response. Which I just saw the other day while in Banff. (If not in Schiphol, which would have been more appropriate!)

“In this rejoinder we argue that Robert’s (2016) alternative view on testing has more in common with Jeffreys’s Bayes factor than he suggests, as they share the same ‘‘shortcomings’’.”

Rather unsurprisingly (!), the authors agree with my position on the dangers to ignore decisional aspects when using the Bayes factor. A point of dissension is the resolution of the Jeffreys[-Lindley-Bartlett] paradox. One consequence derived by Alexander and co-authors is that priors should change between testing and estimating. Because the parameters have a different meaning under the null and under the alternative, a point I agree with in that these parameters are indexed by the model [index!]. But with which I disagree when arguing that the same parameter (e.g., a mean under model M¹) should have two priors when moving from testing to estimation. To state that the priors within the marginal likelihoods “are not designed to yield posteriors that are good for estimation” (p.45) amounts to wishful thinking. I also do not find a strong justification within the paper or the response about choosing an improper prior on the nuisance parameter, e.g. σ, with the same constant. Another a posteriori validation in my opinion. However, I agree with the conclusion that the Jeffreys paradox prohibits the use of an improper prior on the parameter being tested (or of the test itself). A second point made by the authors is that Jeffreys’ Bayes factor is information consistent, which is correct but does not solved my quandary with the lack of precise calibration of the object, namely that alternatives abound in a non-informative situation.

“…the work by Kamary et al. (2014) impressively introduces an alternative view on testing, an algorithmic resolution, and a theoretical justification.”

The second part of the comments is highly supportive of our mixture approach and I obviously appreciate very much this support! Especially if we ever manage to turn the paper into a discussion paper! The authors also draw a connection with Harold Jeffreys’ distinction between testing and estimation, based upon Laplace’s succession rule. Unbearably slow succession law. Which is well-taken if somewhat specious since this is a testing framework where a single observation can send the Bayes factor to zero or +∞. (I further enjoyed the connection of the Poisson-versus-Negative Binomial test with Jeffreys’ call for common parameters. And the supportive comments on our recent mixture reparameterisation paper with Kaniav Kamari and Kate Lee.) The other point that the Bayes factor is more sensitive to the choice of the prior (beware the tails!) can be viewed as a plus for mixture estimation, as acknowledged there. (The final paragraph about the faster convergence of the weight α is not strongly