Archive for ABC-PMC

adaptive ABC tolerance

Posted in Books, Statistics, University life with tags , , , , , , , , , on June 2, 2020 by xi'an

“There are three common approaches for selecting the tolerance sequence (…) [they] can lead to inefficient sampling”

Umberto Simola, Jessi Cisewski-Kehe, Michael Gutmann and Jukka Corander recently arXived a paper entitled Adaptive Approximate Bayesian Computation Tolerance Selection. I appreciate that they start from our ABC-PMC paper, i.e., Beaumont et al. (2009) [although the representation that the ABC tolerances are fixed in advance is somewhat incorrect in that we used in our codes quantiles of the distances to set our tolerances.] This is also the approach advocated for the initialisation step by the current paper.  Although remaining a wee bit vague. Subsequent steps are based on the proximity between the resulting approximations to the ABC posteriors, more exactly with a quantile derived from the maximum of the ratio between two estimated successive ABC posteriors. Mimicking the Accept-Reject step if always one step too late.  The iteration stops when the ratio is almost one, possibly missing the target due to Monte Carlo variability. (Recall that the “optimal” tolerance is not zero for a finite sample size.)

“…the decrease in the acceptance rate is mitigated by the improvement in the proposed particles.”

A problem is that it depends on the form of the approximation and requires non-parametric hence imprecise steps. Maybe variational encoders could help. Interesting approach by Sugiyama et al. (2012), of which I knew nothing, the core idea being that the ratio of two densities is also the solution to minimising a distance between the numerator density and a variable function times the bottom density. However since only the maximum of the ratio is needed, a more focused approach could be devised. Rather than first approximating the ratio and second maximising the estimated ratio. Maybe the solution of Goffinet et al. (1992) on estimating an accept-reject constant could work.

A further comment is that the estimated density is not properly normalised, which lessens the Accept-Reject analogy since the optimum may well stand above one. And thus stop “too soon”. (Incidentally, the paper contains the mixture example of Sisson et al. (2007), for which our own graphs were strongly criticised during our Biometrika submission!)

ABC by QMC

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on November 5, 2018 by xi'an

A paper by Alexander Buchholz (CREST) and Nicolas Chopin (CREST) on quasi-Monte Carlo methods for ABC is going to appear in the Journal of Computational and Graphical Statistics. I had missed the opportunity when it was posted on arXiv and only became aware of the paper’s contents when I reviewed Alexander’s thesis for the doctoral school. The fact that the parameters are simulated (in ABC) from a prior that is quite generally a standard distribution while the pseudo-observations are simulated from a complex distribution (associated with the intractability of the likelihood function) means that the use of quasi-Monte Carlo sequences is in general only possible for the first part.

The ABC context studied there is close to the original version of ABC rejection scheme [as opposed to SMC and importance versions], the main difference standing with the use of M pseudo-observations instead of one (of the same size as the initial data). This repeated version has been discussed and abandoned in a strict Monte Carlo framework in favor of M=1 as it increases the overall variance, but the paper uses this version to show that the multiplication of pseudo-observations in a quasi-Monte Carlo framework does not increase the variance of the estimator. (Since the variance apparently remains constant when taking into account the generation time of the pseudo-data, we can however dispute the interest of this multiplication, except to produce a constant variance estimator, for some targets, or to be used for convergence assessment.) L The article also covers the bias correction solution of Lee and Latuszyǹski (2014).

Due to the simultaneous presence of pseudo-random and quasi-random sequences in the approximations, the authors use the notion of mixed sequences, for which they extend a one-dimension central limit theorem. The paper focus on the estimation of Z(ε), the normalization constant of the ABC density, ie the predictive probability of accepting a simulation which can be estimated at a speed of O(N⁻¹) where N is the number of QMC simulations, is a wee bit puzzling as I cannot figure the relevance of this constant (function of ε), especially since the result does not seem to generalize directly to other ABC estimators.

A second half of the paper considers a sequential version of ABC, as in ABC-SMC and ABC-PMC, where the proposal distribution is there  based on a Normal mixture with a small number of components, estimated from the (particle) sample of the previous iteration. Even though efficient techniques for estimating this mixture are available, this innovative step requires a calculation time that should be taken into account in the comparisons. The construction of a decreasing sequence of tolerances ε seems also pushed beyond and below what a sequential approach like that of Del Moral, Doucet and Jasra (2012) would produce, it seems with the justification to always prefer the lower tolerances. This is not necessarily the case, as recent articles by Li and Fearnhead (2018a, 2018b) and ours have shown (Frazier et al., 2018). Overall, since ABC methods are large consumers of simulation, it is interesting to see how the contribution of QMC sequences results in the reduction of variance and to hope to see appropriate packages added for standard distributions. However, since the most consuming part of the algorithm is due to the simulation of the pseudo-data, in most cases, it would seem that the most relevant focus should be on QMC add-ons on this part, which may be feasible for models with a huge number of standard auxiliary variables as for instance in population evolution.

optimal proposal for ABC

Posted in Statistics with tags , , , , , , , , , , on October 8, 2018 by xi'an

As pointed out by Ewan Cameron in a recent c’Og’ment, Justin Alsing, Benjamin Wandelt, and Stephen Feeney have arXived last August a paper where they discuss an optimal proposal density for ABC-SMC and ABC-PMC. Optimality being understood as maximising the effective sample size.

“Previous studies have sought kernels that are optimal in the (…) Kullback-Leibler divergence between the proposal KDE and the target density.”

The effective sample size for ABC-SMC is actually the regular ESS multiplied by the fraction of accepted simulations. Which surprisingly converges to the ratio

E[q(θ)/π(θ)|D]/E[π(θ)/q(θ)|D]

under the (true) posterior. (Where q(θ) is the importance density and π(θ) the prior density.] When optimised in q, this usually produces an implicit equation which results in a form of geometric mean between posterior and prior. The paper looks at approximate ways to find this optimum. Especially at an upper bound on q. Something I do not understand from the simulations is that the starting point seems to be the plain geometric mean between posterior and prior, in a setting where the posterior is supposedly unavailable… Actually the paper is silent on how the optimal can be approximated in practice, for the very reason I just mentioned. Apart from using a non-parametric or mixture estimate of the posterior after each SMC iteration, which may prove extremely costly when processed through the optimisation steps. However, an interesting if side outcome of these simulations is that the above geometric mean does much better than the posterior itself when considering the effective sample size.

astroABC: ABC SMC sampler for cosmological parameter estimation

Posted in Books, R, Statistics, University life with tags , , , , , , , , on September 6, 2016 by xi'an

“…the chosen statistic needs to be a so-called sufficient statistic in that any information about the parameter of interest which is contained in the data, is also contained in the summary statistic.”

Elise Jenningsa and Maeve Madigan arXived a paper on a new Python code they developed for implementing ABC-SMC, towards astronomy or rather cosmology applications. They stress the parallelisation abilities of their approach which leads to “crucial speed enhancement” against the available competitors, abcpmc and cosmoabc. The version of ABC implemented there is “our” ABC PMC where particle clouds are shifted according to mixtures of random walks, based on each and every point of the current cloud, with a scale equal to twice the estimated posterior variance. (The paper curiously refers to non-astronomy papers through their arXiv version, even when they have been published. Like our 2008 Biometrika paper.) A large part of the paper is dedicated to computing aspects that escape me, like the constant reference to MPIs. The algorithm is partly automated, except for the choice of the summary statistics and of the distance. The tolerance is chosen as a (large) quantile of the previous set of simulated distances. Getting comments from the designers of abcpmc and cosmoabc would be great.

“It is clear that the simple Gaussian Likelihood assumption in this case, which neglects the effects of systematics yields biased cosmological constraints.”

The last part of the paper compares ABC and MCMC on a supernova simulated dataset. Which is somewhat a dubious comparison since the model used for producing the data and running ABC is not the same as the Gaussian version used with MCMC. Unsurprisingly, MCMC then misses the true value of the cosmological parameters and most likely and more importantly the true posterior HPD region. While ABC SMC (or PMC) proceeds to a concentration around the genuine parameter values. (There is no additional demonstration of how accelerated the approach is.)

Bayesian Indirect Inference and the ABC of GMM

Posted in Books, Statistics, University life with tags , , , , , , , , , , on February 17, 2016 by xi'an

“The practicality of estimation of a complex model using ABC is illustrated by the fact that we have been able to perform 2000 Monte Carlo replications of estimation of this simple DSGE model, using a single 32 core computer, in less than 72 hours.” (p.15)

Earlier this week, Michael Creel and his coauthors arXived a long paper with the above title, where ABC relates to approximate Bayesian computation. In short, this paper provides deeper theoretical foundations for the local regression post-processing of Mark Beaumont and his coauthors (2002). And some natural extensions. But apparently considering one univariate transform η(θ) of interest at a time. The theoretical validation of the method is that the resulting estimators converge at speed √n under some regularity assumptions. Including the identifiability of the parameter θ in the mean of the summary statistics T, which relates to our consistency result for ABC model choice. And a CLT on an available (?) preliminary estimator of η(θ).

The paper also includes a GMM version of ABC which appeal is less clear to me as it seems to rely on a preliminary estimator of the univariate transform of interest η(θ). Which is then randomized by a normal random walk. While this sounds a wee bit like noisy ABC, it differs from this generic approach as the model is not assumed to be known, but rather available through an asymptotic Gaussian approximation. (When the preliminary estimator is available in closed form, I do not see the appeal of adding this superfluous noise. When it is unavailable, it is unclear why a normal perturbation can be produced.)

“[In] the method we study, the estimator is consistent, asymptotically normal, and asymptotically as efficient as a limited information maximum likelihood estimator. It does not require either optimization, or MCMC, or the complex evaluation of the likelihood function.” (p.3)

Overall, I have trouble relating the paper to (my?) regular ABC in that the outcome of the supported procedures is an estimator rather than a posterior distribution. Those estimators are demonstrably endowed with convergence properties, including quantile estimates that can be exploited for credible intervals, but this does not produce a posterior distribution in the classical Bayesian sense. For instance, how can one run model comparison in this framework? Furthermore, each of those inferential steps requires solving another possibly costly optimisation problem.

“Posterior quantiles can also be used to form valid confidence intervals under correct model specification.” (p.4)

Nitpicking(ly), this statement is not correct in that posterior quantiles produce valid credible intervals and only asymptotically correct confidence intervals!

“A remedy is to choose the prior π(θ) iteratively or adaptively as functions of initial estimates of θ, so that the “prior” becomes dependent on the data, which can be denoted as π(θ|T).” (p.6)

This modification of the basic ABC scheme relying on simulation from the prior π(θ) can be found in many earlier references and the iterative construction of a better fitted importance function rather closely resembles ABC-PMC. Once again nitpicking(ly), the importance weights are defined therein (p.6) as the inverse of what they should be.