variational consensus Monte Carlo
“Unfortunately, the factorization does not make it immediately clear how to aggregate on the level of samples without first having to obtain an estimate of the densities themselves.” (p.2)
The recently arXived variational consensus Monte Carlo is a paper by Maxim Rabinovich, Elaine Angelino, and Michael Jordan that approaches the consensus Monte Carlo principle from a variational perspective. As in the embarrassingly parallel version, the target is split into a product of K terms, each being interpreted as an unnormalised density and being fed to a different parallel processor. The most natural partition is to break the data into K subsamples and to raise the prior to the power 1/K in each term. While this decomposition makes sense from a storage perspective, since each bit corresponds to a different subsample of the data, it raises the question of the statistical pertinence of splitting the prior and my feelings about it are now more lukewarm than when I commented on the embarrassingly parallel version, mainly for the reason that it is not reparameterisation invariant—getting different targets if one does the reparameterisation before or after the partition—and hence does not treat the prior as the reference measure it should be. I therefore prefer the version where the same original prior is attached to each part of the partitioned likelihood (and even more the random subsampling approaches discussed in the recent paper of Bardenet, Doucet, and Holmes). Another difficulty with the decomposition is that a product of densities is not a density in most cases (it may even be of infinite mass) and does not offer a natural path to the analysis of samples generated from each term in the product. Nor an explanation as to why those samples should be relevant to construct a sample for the original target.
“The performance of our algorithm depends critically on the choice of aggregation function family.” (p.5)
Since the variational Bayes approach is a common answer to complex products models, Rabinovich et al. explore the use of variational Bayes techniques to build the consensus distribution out of the separate samples. As in Scott et al., and Neiswanger et al., the simulation from the consensus distribution is a transform of simulations from each of the terms in the product, e.g., a weighted average. Which determines the consensus distribution as a member of an aggregation family defined loosely by a Dirac mass. When the transform is a sum of individual terms, variational Bayes solutions get much easier to find and the authors work under this restriction… In the empirical evaluation of this variational Bayes approach as opposed to the uniform and Gaussian averaging options in Scott et al., it improves upon those, except in a mixture example with a large enough common variance.
In fine, despite the relevance of variational Bayes to improve the consensus approximation, I still remain unconvinced about the use of the product of (pseudo-)densities and the subsequent mix of simulations from those components, for the reason mentioned above and also because the tail behaviour of those components is not related with the tail behaviour of the target. Still, this is a working solution to a real problem and as such is a reference for future works.