ABC for big data

“The results in this paper suggest that ABC can scale to large data, at least for models with a xed number of parameters, under the assumption that the summary statistics obey a central limit theorem.”

In a week rich with arXiv submissions about MCMC and “big data”, like the Variational consensus Monte Carlo of Rabinovich et al., or scalable Bayesian inference via particle mirror descent by Dai et al., Wentao Li and Paul Fearnhead contributed an impressive paper entitled Behaviour of ABC for big data. However, a word of warning: the title is somewhat misleading in that the paper does not address the issue of big or tall data per se, e.g., the impossibility to handle the whole data at once and to reproduce it by simulation, but rather the asymptotics of ABC. The setting is not dissimilar to the earlier Fearnhead and Prangle (2012) Read Paper. The central theme of this theoretical paper [with 24 pages of proofs!] is to study the connection between the number N of Monte Carlo simulations and the tolerance value ε when the number of observations n goes to infinity. A main result in the paper is that the ABC posterior mean can have the same asymptotic distribution as the MLE when ε=o(n-1/4). This is however in opposition with of no direct use in practice as the second main result that the Monte Carlo variance is well-controlled only when ε=O(n-1/2).

Something I have (slight) trouble with is the construction of an importance sampling function of the fABC(s|θ)α when, obviously, this function cannot be used for simulation purposes. The authors point out this fact, but still build an argument about the optimal choice of α, namely away from 0 and 1, like ½. Actually, any value different from 0,1, is sensible, meaning that the range of acceptable importance functions is wide. Most interestingly (!), the paper constructs an iterative importance sampling ABC in a spirit similar to Beaumont et al. (2009) ABC-PMC. Even more interestingly, the ½ factor amounts to updating the scale of the proposal as twice the scale of the target, just as in PMC.

Another aspect of the analysis I do not catch is the reason for keeping the Monte Carlo sample size to a fixed value N, while setting a sequence of acceptance probabilities (or of tolerances) along iterations. This is a very surprising result in that the Monte Carlo error does remain under control and does not dominate the overall error!

“Whilst our theoretical results suggest that point estimates based on the ABC posterior have good properties, they do not suggest that the ABC posterior is a good approximation to the true posterior, nor that the ABC posterior will accurately quantify the uncertainty in estimates.”

Overall, this is clearly a paper worth reading for understanding the convergence issues related with ABC. With more theoretical support than the earlier Fearnhead and Prangle (2012). However, it does not provide guidance into the construction of a sequence of Monte Carlo samples nor does it discuss the selection of the summary statistic, which has obviously a major impact on the efficiency of the estimation. And to relate to the earlier warning, it does not cope with “big data” in that it reproduces the original simulation of the n sized sample.

3 Responses to “ABC for big data”

1. A motivating discussion is definitely worth comment.
not be a taboo subject but generally folks don’t talk about such subjects.

To the next! Kind regards!!

2. Thank you for your interest and detailed comments in our work, professor Robert.

About the comparison between MLE and ABC estimator, I think their similarity comes from the convergence order of the Monte Carlo variance and the negligibility of the ABC bias. When $\varepsilon=O(n^-1/2)$ and the importance function is sensible, the Monte Carlo variance is only $K/N$, where $K$ is a constant, times larger than the variance of MLE and the bias from $\varepsilon$ is negligible due to the weaker requirement of $\varepsilon=o(n^-1/4)$. Therefore as the data size gets larger, the difference between the mean square errors of ABC and MLE can be made arbitrarily small by a large but fixed $N$.

I guess the reason behind the intuition that the Monte Carlo error would explode with a fixed N is that when the data size is large, either the acceptance probability is small, when sampling from the prior, or the importance weight is skewed, when sampling around the true value. The class of ‘sensible’ proposal distribution somehow balances these two, getting the acceptance probability away from 0 and the variance of the importance weight under controlled.

3. I have a general comment:

“Something I have (slight) trouble with is the construction of an importance sampling function of the fABC(s|θ)α when, obviously, this function cannot be used for simulation purposes. The authors point out this fact, but still build an argument about the optimal choice of α, namely away from 0 and 1, like ½”

I guess one key result for us is that the Monte Carlo error is well-behaved (in the sense that even if N is fixed, it does not dominate the sampling variability in the estimator) for IS-ABC if you choose a sensible proposal. This was surprising at first (as we initially expected that the acceptance probability would always get smaller if you had more data — a message that comes across in some of the recent work on ABC, such as optimisation Monte Carlo, which suggest that acceptance probabilities in ABC will be small for big data applications).

The key message from the above result was the class of “sensible” proposal distributions is wide (if you work with fABC(s|θ)α then any alpha in (0,1) is sensible).

This site uses Akismet to reduce spam. Learn how your comment data is processed.