PoincarĂ© recurrence in action!

]]>On an unrelated note, I really wanted to sneak the phrase “slouching towards Bethlehem” (my love of Joni knows no bounds) into the discussion of convergence, but I chickened out. Perhaps it could be a talk title.

]]>You just have to wait long enough. Soon they’ll be bringing back symplectic flows!

]]>Everything old is new again.

]]>Because there’s no obvious benchmark problem to compare on. Gibbs samplers are data-local, ADG isn’t, so any problem that a Gibbs sampler will work on is structurally different to the types of problems we’re targeting.

Comparing across parallel mcmc methods is also hard. They all solve particular problems well, but it would be easy to (accidentally, on purpose, or anything in between) bias the comparison by the choice of problems.

]]>It’s also worth noting that the reason we didn’t compare is that it’s hard to imagine a meaningful comparison of a serial method against a parallel one. The outcome will depend on the test case. We don’t come to bury, but to extend (etc etc etc)…

I would also note that of course we have to deal with all of the data. The situations for which this is not true (where you can independently send a subsample to a processor an ignore the rest of the data) is trivial in the sense that a) we know how to do that already and b) we could probably subsample already. So any non-i.i.d. parallel algorithm has a “minimum communication cost”.

The two serious examples in this paper cover cases where the data is not i.i.d. and there is less over-specification. In particular, it’s hard to imagine a situation with GP regression or mixed effects models where you didn’t need all the data available at once. The advantage of this method is that they don’t need to be available synchronistically.

(Also not the caching in the mixed-effects example – it’s is not a mistake. Again, I think that it’s hard to argue that any method should exist for this problem that doesn’t cache here)

Finally, you refer to “plain Gibbs samplers”. This is meaningless. In the case of GP regression, a “good” plain Gibbs sampler will have spectral gap while a “bad” plain Gibbs sampler will (asymptotically) not. The (synchronous version of the) Gibbs sampler used in the second version isn’t “plain”, so much as “very carefully constructed to be sensible”. Once again, this comes down to the behaviour of serial (non-trivial) Gibbs samplers vs asynchronous ones. The winner of a battle like this will depend on the size of the problem, the underlying spectral gap, and the reliability of the cluster that it’s being computed on.

(And of course we do not compare parallel algorithms [or parallel and sequential algorithms] in general – how could we make a meaningful comparison?)

]]>