MCqMC 2016 [#2]
In her plenary talk this morning, Christine Lemieux discussed connections between quasi-Monte Carlo and copulas, covering a question I have been considering for a while. Namely, when provided with a (multivariate) joint cdf F, is there a generic way to invert a vector of uniforms [or quasi-uniforms] into a simulation from F? For Archimedian copulas (as we always can get back to copulas), there is a resolution by the Marshall-Olkin representation, but this puts a restriction on the distributions F that can be considered. The session on synthetic likelihoods [as introduced by Simon Wood in 2010] put together by Scott Sisson was completely focussed on using normal approximations for the distribution of the vector of summary statistics, rather than the standard ABC non-parametric approximation. While there is a clear (?) advantage in using a normal pseudo-likelihood, since it stabilises with much less simulations than a non-parametric version, I find it difficult to compare both approaches, as they lead to different posterior distributions. In particular, I wonder at the impact of the dimension of the summary statistics on the approximation, in the sense that it is less and less likely that the joint is normal as this dimension increases. Whether this is damaging for the resulting inference is another issue, possibly handled by a supplementary ABC step that would take the first-step estimate as summary statistic. (As a side remark, I am intrigued at everyone being so concerned with unbiasedness of methods that are approximations with no assessment of the amount of approximation!)
The last session of the day was about multimodality and MCMC solutions, with talks by Hyungsuk Tak, Pierre Jacob and Babak Shababa, plus mine. Hunsuk presented the RAM algorithm I discussed earlier under the title of “love-hate” algorithm, which was a kind reference to my post! (I remain puzzled by the ability of the algorithm to jump to another mode, given that the intermediary step aims at a low or even zero probability region with an infinite mass target.) And Pierre talked about using SMC for Wang-Landau algorithms, with a twist to the classical stochastic optimisation schedule that preserves convergence. And a terrific illustration on a distribution inspired from the Golden Gate Bridge that reminded me of my recent crossing! The discussion around my folded Markov chain talk focussed on the extension of the partition to more than two sets, the difficulty being in generating automated projections, with comments about connections with computer graphic tools. (Too bad that the parallel session saw talks by Mark Huber and Rémi Bardenet that I missed! Enjoying a terrific Burmese dinner with Rémi, Pierre and other friends also meant I could not post this entry on time for the customary 00:16. Not that it matters in the least…)
August 18, 2016 at 4:07 am
Thanks for your comments Xian. To further add to Anthony’s comments, we are not really concerned with an approximation of the actual posterior (the posterior conditional on a summary will most of the time be away from this in practice). The unbiasedness from uBSL (http://eprints.qut.edu.au/92795/17/92795%28a%29.pdf) means we don’t have to worry (or worry less) about the impact that the choice of n will have on the target. Thus we can choose n to try to maximise computational efficiency. This is in contrast to ABC where it is known that the choice of epsilon can have a large influence on the target.
In the context of Minh-Ngoc Tran’s talk, having an unbiased estimator of the log-likelihood (of the summary statistic) is useful in the optimisation algorithm for Variational Bayes (http://eprints.qut.edu.au/98023/8/98023.pdf).
So the quest for unbiasedness in these two papers has nothing to do with how the target of the synthetic likelihood methods approximates the actual posterior.
The hypothesis test comment is very interesting and something that we considered (and also commented on in Wood 2010). But to have sufficient power in the test for multivariate normality the value of n needs to be too large, losing the computational gains of the synthetic likelihood in the first place. I guess it is possible to at least look at normality on the marginals. Although I’m not sure how useful that might be as presumably with n large enough this test would give a small p-value most of the time as the summary statistic is very unlikely to be perfectly normally distributed. We have found that the BSL approach seems to be quite robust to some deviation away from normality but have also considered examples with summaries with very heavy tails where BSL will completely fail.
You are correct that BSL has a hidden curse of dimensionality in that the normality assumption is likely to get worse as summary statistics are added. One must be careful with the way summaries are chosen. In future research we plan to try and relax the normality assumption (and hopefully maintain some computational advantage).
Sorry for the long reply.
August 17, 2016 at 5:30 pm
Sounds like an interesting day at MCqMC! For your side remark on unbiasedness, I cannot speak about the other papers, but I can comment about it for the paper with Leah Price, Chris Drovandi and David Nott, and in the indirect inference approach using a parametric auxiliary model more generally.
There, the use of a non-negative, unbiased estimate of the auxiliary likelihood is important primarily because it ensures that the Markov chain is a pseudo-marginal Markov chain with a known invariant probability measure (assuming the product of the auxiliary likelihood and the prior is integrable). The adequacy of the approximation is then determined solely by the closeness of the well-defined auxiliary likelihood to the actual likelihood.
I completely agree that it would be interesting if there were rigorously-justified ways to assess this adequacy in practice, taking into consideration both the parametric auxiliary model as well as the choice of n, which together define the auxiliary likelihood.
August 17, 2016 at 6:21 pm
Thanks, Anthony!, for such a prompt return. I understand and appreciate the connection with pseudo-marginal, as it produces a validation of the algorithm in that the Markov chain converges to a well-defined limit. In line with your last paragraph, I would suggest using a goodness-of-fit test on simulated data (since available) in order to assess how good the synthetic likelihood fits the distribution of the summary statistics. Or a subset of those.