Archive for pseudo-marginal

Bayesian synthetic likelihood [a reply from the authors]

Posted in Books, pictures, Statistics, University life with tags , , , on December 26, 2017 by xi'an

[Following my comments on the Bayesian synthetic likelihood paper in JGCS, the authors sent me the following reply by Leah South (previously Leah Price).]

Thanks Christian for your comments!

ucgsThe pseudo-marginal idea is useful here because it tells us that in the ideal case in which the model statistic is normal and if we use the unbiased density estimator of the normal then we have an MCMC algorithm that converges to the same target regardless of the value of n (number of model simulations per MCMC iteration). It is true that the bias reappears in the case of misspecification. We found that the target based on the simple plug-in Gaussian density was also remarkably insensitive to n. Given this insensitivity, we consider calling again on the pseudo-marginal literature to offer guidance in choosing n to minimise computational effort and we recommend the use of the plug-in Gaussian density in BSL because it is simpler to implement.

“I am also lost to the argument that the synthetic version is more efficient than ABC, in general”

Given the parametric approximation to the summary statistic likelihood, we expect BSL to be computationally more efficient than ABC. We show this is the case theoretically in a toy example in the paper and find empirically on a number of examples that BSL is more computationally efficient, but we agree that further analysis would be of interest.

The concept of using random forests to handle additional summary statistics is interesting and useful. BSL was able to utilise all the information in the high dimensional summary statistics that we considered rather than resorting to dimension reduction (implying a loss of information), and we believe that is a benefit of BSL over standard ABC. Further, in high-dimensional parameter applications the summary statistic dimension will necessarily be large even if there is one statistic per parameter. BSL can be very useful in such problems. In fact we have done some work on exactly this, combining variational Bayes with synthetic likelihood.

Another benefit of BSL is that it is easier to tune (there are fewer tuning parameters and the BSL target is highly insensitive to n). Surprisingly, BSL performs reasonably well when the summary statistics are not normally distributed — as long as they aren’t highly irregular!

exact, unbiased, what else?!

Posted in Books, Statistics, University life with tags , , , , , , , , on April 13, 2016 by xi'an

Last week, Matias Quiroz, Mattias Villani, and Robert Kohn arXived a paper on exact subsampling MCMC, a paper that contributes to the current literature on approximating MCMC samplers for large datasets, in connection with an earlier paper of Quiroz et al. discussed here last week.

quirozetal.The “exact” in the title is to be understood in the Russian roulette sense. By using Rhee and Glynn debiaising device, the authors achieve an unbiased estimator of the likelihood as in Bardenet et al. (2015). The central tool for the derivation of an unbiased and positive estimator is to find a control variate for each component of the log likelihood that is good enough for the difference between the component and the control to be lower bounded. By the constant a in the screen capture above. When the individual terms d in the product are iid unbiased estimates of the log likelihood difference. And q is the sum of the control variates. Or maybe more accurately of the cheap substitutes to the exact log likelihood components. Thus still of complexity O(n), which makes the application to tall data more difficult to contemplate.

The $64 question is obviously how to produce cheap and efficient control variates that kill the curse of the tall data. (It still irks to resort to this term of control variate, really!) Section 3.2 in the paper suggests clustering the data and building an approximation for each cluster, which seems to imply manipulating the whole dataset at this early stage. At a cost of O(Knd). Furthermore, because finding a correct lower bound a is close to impossible in practice, the authors use a “soft lower bound”, meaning that it is only an approximation and thus that (3.4) above can get negative from time to time, which cancels the validation of the method as a pseudo-marginal approach. The resolution of this difficulty is to resort to the same proxy as in the Russian roulette paper, replacing the unbiased estimator with its absolute value, an answer I already discussed for the Russian roulette paper. An additional step is proposed by Quiroz et al., namely correlating the random numbers between numerator and denominator in their final importance sampling estimator, via a Gaussian copula as in Deligiannidis et al.

This paper made me wonder (idly wonder, mind!) anew how to get rid of the vexing unbiasedness requirement. From a statistical and especially from a Bayesian perspective, unbiasedness is a second order property that cannot be achieved for most transforms of the parameter θ. And that does not keep under reparameterisation. It is thus vexing and perplexing that unbiased is so central to the validation of our Monte Carlo technique and that any divergence from this canon leaves us wandering blindly with no guarantee of ever reaching the target of the simulation experiment…