Approximate Integrated Likelihood via ABC methods

My PhD student Clara Grazian just arXived this joint work with Brunero Liseo on using ABC for marginal density estimation. The idea in this paper is to produce an integrated likelihood approximation in intractable problems via the ratio

L(\psi|x)\propto \dfrac{\pi(\psi|x)}{\pi(\psi)}

both terms in the ratio being estimated from simulations,

\hat L(\psi|x) \propto \dfrac{\hat\pi^\text{ABC}(\psi|x)}{\hat\pi(\psi)}

(with possible closed form for the denominator). Although most of the examples processed in the paper (Poisson means ratio, Neyman-Scott’s problem, g-&-k quantile distribution, semi-parametric regression) rely on summary statistics, hence de facto replacing the numerator above with a pseudo-posterior conditional on those summaries, the approximation remains accurate (for those examples). In the g-&-k quantile example, Clara and Brunero compare our ABC-MCMC algorithm with the one of Allingham et al. (2009, Statistics & Computing): the later does better by not replicating values in the Markov chain but instead proposing a new value until it is accepted by the usual Metropolis step. (Although I did not spend much time on this issue, I cannot see how both approaches could be simultaneously correct. Even though the outcomes do not look very different.) As noted by the authors, “the main drawback of the present approach is that it requires the use of proper priors”, unless the marginalisation of the prior can be done analytically. (This is an interesting computational problem: how to provide an efficient approximation to a marginal density of a σ-finite measure, assuming this density exists.)

Clara will give a talk at CREST-ENSAE today about this work, in the Bayes in Paris seminar: 2pm in room 18.

4 Responses to “Approximate Integrated Likelihood via ABC methods”

  1. Dan Simpson Says:

    I don’t see how this is a real challenge? (this being approximate marginalisation). Numerical integration should solve your problems without too much trouble. For example, if you know the tails, it’s not that hard to cook up a Gauss quadrature scheme that will do what you want. Or, to be quite honest, compute the effective support of the posterior and just put an equal weight quadrature rule over that. It’s not pretty but it will work fine (in the sense that the error in the marginalisation will be easily swamped by the MC error)

    Please feel free to add all of the usual comments about improper priors and how, if it’s easier to work with proper priors, it may be a thing worth considering…

    • Dan Simpson Says:

      That comment is predicated on the parameter in question being fairly low dimensional (<10), but improper priors in high dimensions are, I'm pretty sure, quite weird. It's definitely true as the dimension goes to infinity (think of putting a prior on a function space. Most functions are really weird, and we probably don't want prior mass on those ones), but I feel like this "weirdness" kicks in pretty quickly as the dimension grows.

    • Dan Simpson Says:

      (I seem to be leaving a lot of comments here)

      Interesting, the hack-y solution (compute the effective support and put an easy (or appropriate) integration rule over it) matches well with the version of reference priors in Berger etc’s Annals of Statistics paper, in which they presented an algorithm for computing the reference prior at a discrete set of points.

      I also know that this is how everything is implemented in INLA and I strongly suspect that everyone who’s ever written a “partially collapsed” Gibbs sampler (I love that terminology – it’s so evocative) does it this way (I think it’s in Chris Strickland’s PyMCMC package, but I may be wrong. Possibly also STAN, but I’m less sure about that… I know they do this for discrete variables, but it’s a bit different there [unless you approximate the sum by and integral that you approximate by a different sum])

    • as you wrote later, numerical integration does not work when the nuisance parameter is highly dimensional; so there may be cases where this is the “only” possible approach. A different question from Nicolas during the seminar was about the purpose of this likelihood derivation: if a Bayesian analysis is not the final goal, one simply needs the mode and the curvature at the mode to run a likelihood analysis. So simpler algorithms could make more sense.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: