deep learning ABC summary statistics
“The main task of this article is to construct low-dimensional and informative summary statistics for ABC methods.”
The idea in the paper “Learning Summary Statistic for ABC via Deep Neural Network”, arXived a few days ago, is to start from the raw data and build a “deep neural network” (meaning a multiple layer neural network) to provide a non-linear regression of the parameters over the data. (There is a rather militant tone to the justification of the approach, not that unusual with proponents of deep learning approaches, I must add…) Whose calibration never seems an issue. The neural construct is called to produce an estimator (function) of θ, θ(x). Which is then used as the summary statistics. Meaning, if Theorem 1 is to be taken as the proposal, that a different ABC needs to be run for every function of interest. Or, in other words, that the method is not reparameterisation invariant.
The paper claims to achieve the same optimality properties as in Fearnhead and Prangle (2012). These are however moderate optimalities in that they are obtained for the tolerance ε equal to zero. And using the exact posterior expectation as a summary statistic, instead of a non-parametric estimate. And an infinite functional basis in Theorem 2. I thus see little added value in results like Theorem 2 and no real optimality: That the ABC distribution can be arbitrarily close to the exact posterior is not an helpful statement when implementing the method.
The first example in the paper is the posterior distribution associated with the Ising model, which enjoys a sufficient statistic of dimension one. The issue of generating pseudo-data from the Ising model is evacuated by a call to a Gibbs sampler, but remains an intrinsic problem as the convergence of the Gibbs sampler depends on the value of the parameter θ and especially its location wrt the critical point. Both ABC posteriors are shown to be quite close.
The second example is the posterior distribution associated with an MA(2) model, apparently getting into a benchmark in the ABC literature. The comparison between an ABC based on the first two autocorrelations, an ABC based on the semi-automatic solution of Fearnhead and Prangle (2012) [for which collection of summaries?], and the neural network proposal, leads to the dismissal of the semi-automatic solution and the neural net being closest to the exact posterior [with the same tolerance quantile ε for all approaches].
A discussion crucially missing from the paper—from my perspective—is an accounting for size: First, what is the computing cost of fitting and calibrating and storing a neural network for the sole purpose of constructing a summary statistic? Once the neural net is constructed, I would assume most users would see little need in pursuing the experiment any further. (This was also why we stopped at our random forest output rather than using it as a summary statistic.) Second, how do cost and performances evolve as the dimension of the parameter θ grows? I would deem necessary to understand when the method fails. As for instance in latent variable models such as HMMs. Third, how does the size of the sample impact cost and performances? In many realistic cases when ABC applies, it is not possible to use the raw data, given its size, and summary statistics are a given. For such examples, neural networks should be compared with other ABC solutions, using the same reference table.