dimension reduction in ABC [a review's review]

What is very apparent from this study is that there is no single `best’ method of dimension reduction for ABC.

Michael Blum, Matt Nunes, Dennis Prangle and Scott Sisson just posted on arXiv a rather long review of dimension reduction methods in ABC, along with a comparison on three specific models. Given that the choice of the vector of summary statistics is presumably the most important single step in an ABC algorithm and as selecting too large a vector is bound to fall victim of the dimension curse, this is a fairly relevant review! Therein, the authors compare regression adjustments à la Beaumont et al.  (2002), subset selection methods, as in Joyce and Marjoram (2008), and projection techniques, as in Fearnhead and Prangle (2012). They add to this impressive battery of methods the potential use of AIC and BIC. (Last year after ABC in London I reported here on the use of the alternative DIC by Francois and Laval, but the paper is not in the bibliography, I wonder why.) An argument (page 22) for using AIC/BIC is that either provides indirect information about the approximation of p(θ|y) by p(θ|s); this does not seem obvious to me.

The paper also suggests a further regularisation of Beaumont et al.  (2002) by ridge regression, although L1 penalty à la Lasso would be more appropriate in my opinion for removing extraneous summary statistics. (I must acknowledge never being a big fan of ridge regression, esp. in the ad hoc version à la Hoerl and Kennard, i.e. in a non-decision theoretic approach where the hyperparameter λ is derived from the data by X-validation, since it then sounds like a poor man’s Bayes/Stein estimate, just like BIC is a first order approximation to regular Bayes factors… Why pay for the copy when you can afford the original?!) Unsurprisingly, ridge regression does better than plain regression in the comparison experiment when there are many almost collinear summary statistics, but an alternative conclusion could be that regression analysis is not that appropriate with  many summary statistics. Indeed, summary statistics are not quantities of interest but data summarising tools towards a better approximation of the posterior at a given computational cost… (I do not get the final comment, page 36, about the relevance of summary statistics for MCMC or SMC algorithms: the criterion should be the best approximation of p(θ|y) which does not depend on the type of algorithm.)

I find it quite exciting to see the development of a new range of ABC papers like this review dedicated to a better derivation of summary statistics in ABC, each with different perspectives and desideratas, as it will help us to understand where ABC works and where it fails, and how we could get beyond ABC…

4 Responses to “dimension reduction in ABC [a review's review]”

  1. [...] dimension reduction in ABC [a review's review] [...]

  2. [...] reduction in ABC” posted on arXiv by Blum, Nunes, Prangle, and Sisson, I recently commented on the ‘Og. [...]

  3. Hi Christian, thanks for your comments on our paper. My take on the pg 36 comment is the following.

    We say “The price of the benefit of using the more computationally practical, fixed large number of samples, is that decisions on the dimension reduction of the summary statistics will be made on potentially worse estimates of the posterior than those available under superior sampling algorithms.” The ABC rejection sampling estimate of the posterior is typically worse than that of MCMC/SMC because it is feasible to use a smaller ABC threshold with the latter (for the same computational cost). Many of the dimension reduction techniques require a rejection sampling framework, and hence are searching for good summaries under the associated threshold, not the lower threshold possible under MCMC/SMC. In practice this may often still be a good way to generate summaries for MCMC/SMC, but we don’t investigate this question here.

  4. Hi Christian,

    Glad that you had a look to our comparative analysis. Here are some comments that might answer to some of your questions.

    First, you mentioned the DIC criterion proposed by François and Laval (2011) and asked why we do not mention it in our AIC/BIC chapter. François and Laval proposed the DIC to perform model selection in ABC as an alternative to the Bayes factor. The computation of the DIC is based on the deviance function computed in the model (a la Wilkinson) that includes the generative model and the ABC error term. The AIC/BIC criteria we propose are designed for a completely different objective, which is the selection of the informative summary statistics. The statistical model we use to compute the deviance is also different because we consider the local-linear model of the regression of the parameters on the summary statistics. The rationale being that the informative summary statistics should be the ones that are good predictors of the parameter values in the acceptance region. There are nonetheless counterexamples; we can imagine informative summary statistics that vary with the parameter values outside of the acceptance region but not within the acceptance region. Such summary statistics would not be retained by the AIC/BIC criteria although they should be. Hope this helps.

    Lasso regression would indeed be really nice for regression adjustment because it would additionally provide a subset of informative summary statistics. That is something we can include in the R abc package. If someone else wants to include it in the package, he is more than welcomed to do so.

    About the fact that “regression analysis is not that appropriate with many summary statistics”, I do not think it is true. Look at example 3 where we have more than 100 summary statistics and where the error criterion is reduced by almost 50% when performing regression adjustment instead of simple rejection.

    About the relevance of dimension reduction techniques for MCMC or SMC, we want to say that dimension reduction (choice of sum stats or projection technique) can be applied in a preliminary run before the actual run of the MCMC or SMC algorithm. That is what Wegman et al. (2009) did by applying PLS before running a MCMC algorithm where the summary statistics were the components obtained with PLS.

    Cheers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 669 other followers