IS² for Bayesian inference

“…the method of Approximate Bayesian Computation (ABC) may be used to estimate unbiasedly an approximation to the likelihood.”

Minh-Ngoc Tran, Marcel Scharth, Michael Pitt and Robert Kohn arXived a paper on using an unbiased estimate of the likelihood in lieu of the genuine thing and still getting convergence to the right thing. While the spirit of the paper is in the same spirit as the fundamental paper of  Andrieu and Roberts (2009, AoS, somewhat surprisingly missing from the reference), comparing the asymptotic efficiency of using an estimate versus using the genuine likelihood, my attention was distracted by the above quote. This is the only sentence (besides the abstract) where ABC is mentioned and I was a bit confused: ABC is used to estimate an approximation to the likelihood, for sure, converging to

$\int_{d(x,x^\text{obs})\le\varepsilon} f(x|\theta)\,\text{d}\theta$

as the number of pseudo-datasets grows to infinity and it is unbiased on this sense, but this is not the reason for using ABC, as the ABC pseudo-likelihood above is the (by)product of the methodology rather than the genuine quantity of interest. Reading the sentence too fast gave me the feeling that ABC did produce an unbiased approximation to the genuine likelihood! Distracted I was, since this is not at all the point of the paper! However, I would be curious to see how it applies to ABC.

The core result is the convergence of an importance sampling estimator using a likelihood estimated by importance sampling (hence the IS², also inspired by SMC²),. The trick in the proof is to turn the computation of  the likelihood estimand into the production of an (unobserved or “implicitly generated”) auxiliary variable and then to rewrite the original estimator as a genuine importance estimator. (This seems to imply the derivation of an independent importance sampling estimator of the likelihood at each iteration, right?) Standard convergence results then follow, except that the asymptotic variance has an extra term. And except that the estimator of the likelihood does not have to converge, i.e. can keep a fixed number of terms and a positive variance. The second part of the paper establishes that using an estimate degrades the asymptotic variance.

5 Responses to “IS² for Bayesian inference”

1. Not always obvious when the obvious is obvious.

With a large sample from the (exact) prior and a large sample from the (approximate) posterior does seem obvious that you should be able to get a (good) approximation of the posterior/prior.

• K: I am still reluctant to talk of a real or exact prior. Your large sample from the prior means we are drifting towards frequentism…

• To me, a prior is just a statistician’s representation (model) of how nature determined an unknown and I can know exactly what my representation is and work with it for some purpose.

My favourite story in this regard is the angry husband and Picasso.

Husband: Picasso that painting you did of my wife looks nothing like her!
Picasso: Really – what does she look like?
Husband: I have a picture of her in my wallet – see!
Picasso: My she is awfully tiny!!

Also, I thought having anything from a prior kept one away from frequentism (unless one could prove it had absolutely no impact).

2. The logic is almost circular: as an approximation of the likelihood, take the expected value of your favourite ABC scheme. Then by definition, ABC provides an unbiased estimator of that approximation… The same could be said of any approximate method.

• This was my first impression but I was hoping for something deeper…

This site uses Akismet to reduce spam. Learn how your comment data is processed.