The probit example is mostly presented here as pedagogical (toy) example — we introduce it mainly as a means to illustrate our optimality result for exponential models. For the handwritten digits, you write “this example may be misleading in that 100 digits may be enough to find a tolerable approximation to the true MAP”. Here we want to make the important point that in this case if one chooses a *fixed* sub-sample of size 100 images through the algorithm that parameter estimation can be very poor. This is illustrated with the time series inference example, where we show that keeping a fixed subset yields a biased inference (see page 22). In particular, both examples shows that there is a clear evidence that refreshing the sub-sample is a fundamental and novel aspect of the LWA MCMC methology. Theoretically, the method indeed does not allow to obtain samples from “the pseudo data-augmented” distribution: as you mentioned, switching (or refreshing) a subset disturbs the chain stability. Although of great interest, this question remains an important and open question. The main appealing aspect from LWA MCMC comes from the significant computational gain it allows (for a striking illustration of this, see the binary classification example!).

]]>