Yes this marginalisation feature is something that took time for me to understand, if you remember our (most helpful) discussions, as I tend to reason as an importance (if unimportant) sampler. And I agree that the comparison with ABC is quite delicate. Ideally we would like to run them

both for the same amount of computing time and receive a similar approximation to the posterior. If this happens it is a good sign. If it does not, it is delicate to understand which one has erred way from the true target. Maybe using the ABC output as a basis for your ensemble proposal could work in an iterative scheme. But, as you say, if one is much much better, we end up with wasting out time. Unless we can run both methods in parallel (still requiring extra coding!).

In the experiments in the paper, we only use a “global” distribution for pool states, which I think will generally be preferred when the state is just one-dimensional, since not-too-many states are enough to pretty-much cover the possibilities. The ensemble method then comes close to marginalizing over the hidden state sequence. When the state is higher-dimensional, though, I expect that keeping the pool states in the local vicinity of the current sequence may work better.

Comparing with ABC would be interesting. There is a conceptual difficulty, though. MCMC becomes exact in the limit as the number of iterations goes to infinity. ABC becomes exact in the limit as the data tolerance goes to zero (and the number of proposals to obtain an accepted value goes to infinity). To compare, you could set some accuracy requirement, and then see how much compute time is needed with each method. But accuracy of the sampled versus the true posterior distribution isn’t a one-dimensional quantity. So which method is best may depend on how you choose to measure accuracy.

Of course, it often turns out that one method is much, much better than another, in which case this may be just a theoretical point.

]]>