Bayesian inference and the parametric bootstrap

This paper by Brad Efron came to my knowledge when I was looking for references on Bayesian bootstrap to answer a Cross Validated question. After reading it more thoroughly, “Bayesian inference and the parametric bootstrap” puzzles me, which most certainly means I have missed the main point. Indeed, the paper relies on parametric bootstrap—a frequentist approximation technique mostly based on simulation from a plug-in distribution and a robust inferential method estimating distributions from empirical cdfs—to assess (frequentist) coverage properties of Bayesian posteriors. The manuscript mixes a parametric bootstrap simulation output for posterior inference—even though bootstrap produces simulations of estimators while the posterior distribution operates on the parameter space, those  estimator simulations can nonetheless be recycled as parameter simulation by a genuine importance sampling argument—and the coverage properties of Jeffreys posteriors vs. the BCa [which stands for bias-corrected and accelerated, see Efron 1987] confidence density—which truly take place in different spaces. Efron however connects both spaces by taking advantage of the importance sampling connection and defines a corrected BCa prior to make the confidence intervals match. While in my opinion this does not define a prior in the Bayesian sense, since the correction seems to depend on the data. And I see no strong incentive to match the frequentist coverage, because this would furthermore define a new prior for each component of the parameter. This study about the frequentist properties of Bayesian credible intervals reminded me of the recent discussion paper by Don Fraser on the topic, which follows the same argument that Bayesian credible regions are not necessarily good frequentist confidence intervals.

The conclusion of the paper is made of several points, some of which may not be strongly supported by the previous analysis:

1. “The parametric bootstrap distribution is a favorable starting point for importance sampling computation of Bayes posterior distributions.” [I am not so certain about this point given that the bootstrap is based on a pluggin estimate, hence fails to account for the variability of this estimate, and may thus induce infinite variance behaviour, as in the harmonic mean estimator of Newton and Raftery (1994). Because the tails of the importance density are those of the likelihood, the heavier tails of the posterior induced by the convolution with the prior distribution are likely to lead to this fatal misbehaviour of the importance sampling estimator.]
2. “This computation is implemented by reweighting the bootstrap replications rather than by drawing observations directly from the posterior distribution as with MCMC.” [Computing the importance ratio requires the availability both of the likelihood function and of the likelihood estimator, which means a setting where Bayesian computations are not particularly hindered and do not necessarily call for advanced MCMC schemes.]
3. “The necessary weights are easily computed in exponential families for any prior, but are particularly simple starting from Jeffreys invariant prior, in which case they depend only on the deviance difference.” [Always from a computational perspective, the ease of computing the importance weights is mirrored by the ease in handling the posterior distributions.]
4. “The deviance difference depends asymptotically on the skewness of the family, having a cubic normal form.” [No relevant comment.]
5. “In our examples, Jeffreys prior yielded posterior distributions not much different than the unweighted bootstrap distribution. This may be unsatisfactory for single parameters of interest in multi-parameter families.” [The frequentist confidence properties of Jeffreys priors have already been examined in the past and be found to be lacking in multidimensional settings. This is an assessment finding Jeffreys priors lacking from a frequentist perspective. However, the use of Jeffreys prior is not justified on this particular ground.]
6. “Better uninformative priors, such as the Welch and Peers family or reference priors, are closely related to the frequentist BCa reweighting formula.” [The paper only finds proximities in two examples, but it does not assess this relation in a wider generality. Again, this is not particularly relevant from a Bayesian viewpoint.]
7. “Because of the i.i.d. nature of bootstrap resampling, simple formulas exist for the accuracy of posterior computations as a function of the number B of bootstrap replications. Even with excessive choices of B, computation time was measured in seconds for our examples.” [This is not very surprising. It however assesses Bayesian procedures from a frequentist viewpoint, so this may be lost on both Bayesian and frequentist users…]
8. “An efficient second-level bootstrap algorithm (“bootstrap-after-bootstrap”) provides estimates for the frequentist accuracy of Bayesian inferences.” [This is completely correct and why bootstrap is such an appealing technique for frequentist inference. I spent the past two weeks teaching non-parametric bootstrap to my R class and the students are now fluent with the concept, even though they are unsure about the meaning of estimation and testing!]
9. “This can be important in assessing inferences based on formulaic priors, such as those of Jeffreys, rather than on genuine prior experience.” [Again, this is neither very surprising nor particularly appealing to Bayesian users.]

In conclusion, I found the paper quite thought-provoking and stimulating, definitely opening new vistas in a very elegant way. I however remain unconvinced by the simulation aspects from a purely Monte Carlo perspective.

One Response to “Bayesian inference and the parametric bootstrap”

1. I can’t see the point in practice. If the system is easy to simulate from its estimators, then one can assess its behaviour by simulation (then why is this called parametric bootstrap?; perhaps I don’t quite understand). This is well illustrated in Figure 1.

But in real life, (many) models are so nested (hierarchical) and complex that pretending that we can use the plug-in estimator(s) to simulate the real situation is illusory.

That is, I join your criticism to point 1; and this is an important one.

This site uses Akismet to reduce spam. Learn how your comment data is processed.