Archive for ABC

local kernel reduction for ABC

Posted in Books, pictures, Statistics, University life with tags , , , , , on September 14, 2016 by xi'an

“…construction of low dimensional summary statistics can be performed as in a black box…”

Today Zhou and Fukuzumi just arXived a paper that proposes a gradient-based dimension reduction for ABC summary statistics, in the spirit of RKHS kernels as advocated, e.g., by Arthur Gretton. Here the projection is a mere linear projection Bs of the vector of summary statistics, s, where B is an estimated Hessian matrix associated with the posterior expectation E[θ|s]. (There is some connection with the latest version of Li’s and Fearnhead’s paper on ABC convergence as they also define a linear projection of the summary statistics, based on asymptotic arguments, although their matrix does depend on the true value of the parameter.) The linearity sounds like a strong restriction [to me] especially when the summary statistics have no reason to belong to a vectorial space and thus be open to changes of bases and linear projections. For instance, a specific value taken by a summary statistic, like 0 say, may be more relevant than the range of their values. On a larger scale, I am doubtful about always projecting a vector of summary statistics on a subspace with the smallest possible dimension, ie the dimension of θ. In practical settings, it seems impossible to derive the optimal projection and a subvector is almost certain to loose information against a larger vector.

“Another proposal is to use different summary statistics for different parameters.”

Which is exactly what we did in our random forest estimation paper. Using a different forest for each parameter of interest (but no real tree was damaged in the experiment!).

astroABC: ABC SMC sampler for cosmological parameter estimation

Posted in Books, R, Statistics, University life with tags , , , , , , , , on September 6, 2016 by xi'an

“…the chosen statistic needs to be a so-called sufficient statistic in that any information about the parameter of interest which is contained in the data, is also contained in the summary statistic.”

Elise Jenningsa and Maeve Madigan arXived a paper on a new Python code they developed for implementing ABC-SMC, towards astronomy or rather cosmology applications. They stress the parallelisation abilities of their approach which leads to “crucial speed enhancement” against the available competitors, abcpmc and cosmoabc. The version of ABC implemented there is “our” ABC PMC where particle clouds are shifted according to mixtures of random walks, based on each and every point of the current cloud, with a scale equal to twice the estimated posterior variance. (The paper curiously refers to non-astronomy papers through their arXiv version, even when they have been published. Like our 2008 Biometrika paper.) A large part of the paper is dedicated to computing aspects that escape me, like the constant reference to MPIs. The algorithm is partly automated, except for the choice of the summary statistics and of the distance. The tolerance is chosen as a (large) quantile of the previous set of simulated distances. Getting comments from the designers of abcpmc and cosmoabc would be great.

“It is clear that the simple Gaussian Likelihood assumption in this case, which neglects the effects of systematics yields biased cosmological constraints.”

The last part of the paper compares ABC and MCMC on a supernova simulated dataset. Which is somewhat a dubious comparison since the model used for producing the data and running ABC is not the same as the Gaussian version used with MCMC. Unsurprisingly, MCMC then misses the true value of the cosmological parameters and most likely and more importantly the true posterior HPD region. While ABC SMC (or PMC) proceeds to a concentration around the genuine parameter values. (There is no additional demonstration of how accelerated the approach is.)

Approximate Bayesian computation via sufficient dimension reduction

Posted in Statistics, University life with tags , , , , , on August 26, 2016 by xi'an

“One of our contribution comes from the mathematical analysis of the consequence of conditioning the parameters of interest on consistent statistics and intrinsically inconsistent statistics”

Xiaolong Zhong and Malay Ghosh have just arXived an ABC paper focussing on the convergence of the method. And on the use of sufficient dimension reduction techniques for the construction of summary statistics. I had not heard of this approach before so read the paper with interest. I however regret that the paper does not link with the recent consistency results of Liu and Fearnhead and of Daniel Frazier, Gael Martin, Judith Rousseau and myself. When conditioning upon the MLE [or the posterior mean] as the summary statistic, Theorem 1 states that the Bernstein-von Mises theorem holds, missing a limit in the tolerance ε. And apparently missing conditions on the speed of convergence of this tolerance to zero although the conditioning event involves the true value of the parameter. This makes me wonder at the relevance of the result. The part about partial posteriors and the characterisation of limiting posterior distributions stats with the natural remark that the mean of the summary statistic must identify the whole parameter θ to achieve consistency, a point central to our 2014 JRSS B paper. The authors suggest using a support vector machine to derive the summary statistics, an idea already exploited by Heiko Strathmann et al.. There is no consistency result of relevance for ABC in that second and final part, which ends up rather abruptly. Overall, while the paper contributes to the current reflection on the convergence properties of ABC, the lack of scaling of the tolerance ε calls for further investigations.

ABC by subset simulation

Posted in Books, Statistics, Travel with tags , , , , , , , , , on August 25, 2016 by xi'an

Last week, Vakilzadeh, Beck and Abrahamsson arXived a paper entitled “Using Approximate Bayesian Computation by Subset Simulation for Efficient Posterior Assessment of Dynamic State-Space Model Classes”. It follows an earlier paper by Beck and co-authors on ABC by subset simulation, paper that I did not read. The model of interest is a hidden Markov model with continuous components and covariates (input), e.g. a stochastic volatility model. There is however a catch in the definition of the model, namely that the observable part of the HMM includes an extra measurement error term linked with the tolerance level of the ABC algorithm. Error term that is dependent across time, the vector of errors being within a ball of radius ε. This reminds me of noisy ABC, obviously (and as acknowledged by the authors), but also of some ABC developments of Ajay Jasra and co-authors. Indeed, as in those papers, Vakilzadeh et al. use the raw data sequence to compute their tolerance neighbourhoods, which obviously bypasses the selection of a summary statistic [vector] but also may drown signal under noise for long enough series.

“In this study, we show that formulating a dynamical system as a general hierarchical state-space model enables us to independently estimate the model evidence for each model class.”

Subset simulation is a nested technique that produces a sequence of nested balls (and related tolerances) such that the conditional probability to be in the next ball given the previous one remains large enough. Requiring a new round of simulation each time. This is somewhat reminding me of nested sampling, even though the two methods differ. For subset simulation, estimating the level probabilities means that there also exists a converging (and even unbiased!) estimator for the evidence associated with different tolerance levels. Which is not a particularly natural object unless one wants to turn it into a tolerance selection principle, which would be quite a novel perspective. But not one adopted in the paper, seemingly. Given that the application section truly compares models I must have missed something there. (Blame the long flight from San Francisco to Sydney!) Interestingly, the different models as in Table 4 relate to different tolerance levels, which may be an hindrance for the overall validation of the method.

I find the subsequent part on getting rid of uncertain prediction error model parameters of lesser [personal] interest as it essentially replaces the marginal posterior on the parameters of interest by a BIC approximation, with the unsurprising conclusion that “the prior distribution of the nuisance parameter cancels out”.

off to Australia

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , on August 22, 2016 by xi'an

south bank of the Yarra river, Melbourne, July 21, 2012Taking advantage of being in San Francisco, I flew yesterday to Australia over the Pacific, crossing for the first time the day line. The 15 hour Qantas flight to Sydney was remarkably smooth and quiet, with most passengers sleeping for most of the way, and it gave me a great opportunity to go over several papers I wanted to read and review. Over the next week or so, I will work with my friends and co-authors David Frazier and Gael Martin at Monash University (and undoubtedly enjoy the great food and wine scene!). Before flying back to Paris (alas via San Francisco rather than direct).

asymptotic properties of Approximate Bayesian Computation

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on July 26, 2016 by xi'an

Street light near the St Kilda Road bridge, Melbourne, July 21, 2012With David Frazier and Gael Martin from Monash University, and with Judith Rousseau (Paris-Dauphine), we have now completed and arXived a paper entitled Asymptotic Properties of Approximate Bayesian Computation. This paper undertakes a fairly complete study of the large sample properties of ABC under weak regularity conditions. We produce therein sufficient conditions for posterior concentration, asymptotic normality of the ABC posterior estimate, and asymptotic normality of the ABC posterior mean. Moreover, those (theoretical) results are of significant import for practitioners of ABC as they pertain to the choice of tolerance ε used within ABC for selecting parameter draws. In particular, they [the results] contradict the conventional ABC wisdom that this tolerance should always be taken as small as the computing budget allows.

Now, this paper bears some similarities with our earlier paper on the consistency of ABC, written with David and Gael. As it happens, the paper was rejected after submission and I then discussed it in an internal seminar in Paris-Dauphine, with Judith taking part in the discussion and quickly suggesting some alternative approach that is now central to the current paper. The previous version analysed Bayesian consistency of ABC under specific uniformity conditions on the summary statistics used within ABC. But conditions for consistency are now much weaker conditions than earlier, thanks to Judith’s input!

There are also similarities with Li and Fearnhead (2015). Previously discussed here. However, while similar in spirit, the results contained in the two papers strongly differ on several fronts:

  1. Li and Fearnhead (2015) considers an ABC algorithm based on kernel smoothing, whereas our interest is the original ABC accept-reject and its many derivatives
  2. our theoretical approach permits a complete study of the asymptotic properties of ABC, posterior concentration, asymptotic normality of ABC posteriors, and asymptotic normality of the ABC posterior mean, whereas Li and Fearnhead (2015) is only concerned with asymptotic normality of the ABC posterior mean estimator (and various related point estimators);
  3. the results of Li and Fearnhead (2015) are derived under very strict uniformity and continuity/differentiability conditions, which bear a strong resemblance to those conditions in Yuan and Clark (2004) and Creel et al. (2015), while the result herein do not rely on such conditions and only assume very weak regularity conditions on the summaries statistics themselves; this difference allows us to characterise the behaviour of ABC in situations not covered by the approach taken in Li and Fearnhead (2015);

automatic variational ABC

Posted in pictures, Statistics with tags , , , , , , , , , , on July 8, 2016 by xi'an

Amster11“Stochastic Variational inference is an appealing alternative to the inefficient sampling approaches commonly used in ABC.”

Moreno et al. [including Ted Meeds and Max Welling] recently arXived a paper merging variational inference and ABC. The argument for turning variational is computational speedup. The traditional (in variational inference) divergence decomposition of the log-marginal likelihood is replaced by an ABC version, parameterised in terms of intrinsic generators (i.e., generators that do not depend on cyber-parameters, like the U(0,1) or the N(0,1) generators). Or simulation code in the authors’ terms. Which leads to the automatic aspect of the approach. In the paper the derivation of the gradient is indeed automated.

“One issue is that even assuming that the ABC likelihood is an unbiased estimator of the true likelihood (which it is not), taking the log introduces a bias, so that we now have a biased estimate of the lower bound and thus biased gradients.”

I wonder how much of an issue this is, since we consider the variational lower bound. To be optimised in terms of the parameters of the variational posterior. Indeed, the endpoint of the analysis is to provide an optimal variational approximation, which remains an approximation whether or not the likelihood estimator is unbiased. A more “severe” limitation may be in the inversion constraint, since it seems to eliminate Beta or Gamma distributions. (Even though calling qbeta(runif(1),a,b) definitely is achievable… And not rejected by a Kolmogorov-Smirnov test.)

Incidentally, I discovered through the paper the existence of the Kumaraswamy distribution, which main appeal seems to be the ability to produce a closed-form quantile function, while bearing some resemblance with the Beta distribution. (Another arXival by Baltasar Trancón y Widemann studies some connections between those, but does not tell how to select the parameters to optimise the similarity.)