pseudo-marginal MCMC with minimal replicas

A week ago, Chris Sherlock, Alexandre Thiery and Anthony Lee (Warwick) arXived a short paper where they show that it is most efficient to use only one random substitute to the likelihood when considering a pseudo-marginal algorithm. Thus generalising an earlier paper of Luke Bornn and co-authors I commented earlier. A neat side result in the paper is that the acceptance probability for m replicas [in the pseudo-marginal approximation] is at most m/s the acceptance probability for s replicas when s<m. The main result is as follows:

screenshot_20161114_140345There is a (superficial?) connection with the fact that when running Metropolis-within-Gibbs there is no advantage in doing more than one single Metropolis iteration, as the goal is to converge to the joint posterior, rather than approximating better the full conditional…

3 Responses to “pseudo-marginal MCMC with minimal replicas”

  1. Anthony Lee Says:

    Hi Dan,

    It is quite difficult to meaningfully compare pseudo-marginal Markov chains with their “noisy” counterparts. E.g., Medina–Aguayo et al. (2016) provide examples where one is “good” and the other is not and vice versa.

    By using more samples per iteration in the noisy algorithm, one can often reduce the asymptotic bias. If you want to control mean-squared error, e.g., then this suggests one should average more samples per iteration as one considers longer chains. Unfortunately, current theory is not developed enough to really tell you what the asymptotic bias or variance is, so it is not known how to precisely trade off samples per iteration and length of the chain when taking computational cost into account.

    In the pseudo-marginal case, only the asymptotic variance matters as the asymptotic bias is zero. The point of this paper is to demonstrate clearly what one can expect in the case where one averages more samples: there is a reduction in asymptotic variance but this reduction cannot be “too big”, so that one can consider the optimal number of samples in various computational settings.

  2. Thank you for highlighting our article. We would like to add that, to our minds, the most important result is the bound in Theorem 1. The result in the blog, Proposition 1, then shows that this bound is tight.

    Theorem 1, with s=1 and m=2 states that
    V(f,P_1) + \text{var}_{\pi}(f) \le 2\{V(f,P_2) + var_{\pi}(f)\};
    if an average of two estimators is used then the quantity {asymptotic variance + true variance} cannot reduce by more than a factor of 2.

    A more “natural-seeming” result would be that the asymptotic variance itself can never reduce by more than a factor of two. Proposition 1 shows that the bound in Theorem 1 is tight, and that even in the case of independent estimators, no such “natural-seeming” result is possible in general.

    • Dan Simpson Says:

      This seems a little surprising compared with the Noisy Metropolis Hastings paper of Alquier et al, which showed that averages of biased estimators produced much better chains (for the cost of a little bias).

      This looks like a bigger gap between asymptotically unbiased and asymptotically biased MCMC than I would (naively) expect.

      Any thoughts? (Or have I maybe just misunderstood…)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s