Archive for ABC

Approximate Bayesian computation via sufficient dimension reduction

Posted in Statistics, University life with tags , , , , , on August 26, 2016 by xi'an

“One of our contribution comes from the mathematical analysis of the consequence of conditioning the parameters of interest on consistent statistics and intrinsically inconsistent statistics”

Xiaolong Zhong and Malay Ghosh have just arXived an ABC paper focussing on the convergence of the method. And on the use of sufficient dimension reduction techniques for the construction of summary statistics. I had not heard of this approach before so read the paper with interest. I however regret that the paper does not link with the recent consistency results of Liu and Fearnhead and of Daniel Frazier, Gael Martin, Judith Rousseau and myself. When conditioning upon the MLE [or the posterior mean] as the summary statistic, Theorem 1 states that the Bernstein-von Mises theorem holds, missing a limit in the tolerance ε. And apparently missing conditions on the speed of convergence of this tolerance to zero although the conditioning event involves the true value of the parameter. This makes me wonder at the relevance of the result. The part about partial posteriors and the characterisation of limiting posterior distributions stats with the natural remark that the mean of the summary statistic must identify the whole parameter θ to achieve consistency, a point central to our 2014 JRSS B paper. The authors suggest using a support vector machine to derive the summary statistics, an idea already exploited by Heiko Strathmann et al.. There is no consistency result of relevance for ABC in that second and final part, which ends up rather abruptly. Overall, while the paper contributes to the current reflection on the convergence properties of ABC, the lack of scaling of the tolerance ε calls for further investigations.

ABC by subset simulation

Posted in Books, Statistics, Travel with tags , , , , , , , , , on August 25, 2016 by xi'an

Last week, Vakilzadeh, Beck and Abrahamsson arXived a paper entitled “Using Approximate Bayesian Computation by Subset Simulation for Efficient Posterior Assessment of Dynamic State-Space Model Classes”. It follows an earlier paper by Beck and co-authors on ABC by subset simulation, paper that I did not read. The model of interest is a hidden Markov model with continuous components and covariates (input), e.g. a stochastic volatility model. There is however a catch in the definition of the model, namely that the observable part of the HMM includes an extra measurement error term linked with the tolerance level of the ABC algorithm. Error term that is dependent across time, the vector of errors being within a ball of radius ε. This reminds me of noisy ABC, obviously (and as acknowledged by the authors), but also of some ABC developments of Ajay Jasra and co-authors. Indeed, as in those papers, Vakilzadeh et al. use the raw data sequence to compute their tolerance neighbourhoods, which obviously bypasses the selection of a summary statistic [vector] but also may drown signal under noise for long enough series.

“In this study, we show that formulating a dynamical system as a general hierarchical state-space model enables us to independently estimate the model evidence for each model class.”

Subset simulation is a nested technique that produces a sequence of nested balls (and related tolerances) such that the conditional probability to be in the next ball given the previous one remains large enough. Requiring a new round of simulation each time. This is somewhat reminding me of nested sampling, even though the two methods differ. For subset simulation, estimating the level probabilities means that there also exists a converging (and even unbiased!) estimator for the evidence associated with different tolerance levels. Which is not a particularly natural object unless one wants to turn it into a tolerance selection principle, which would be quite a novel perspective. But not one adopted in the paper, seemingly. Given that the application section truly compares models I must have missed something there. (Blame the long flight from San Francisco to Sydney!) Interestingly, the different models as in Table 4 relate to different tolerance levels, which may be an hindrance for the overall validation of the method.

I find the subsequent part on getting rid of uncertain prediction error model parameters of lesser [personal] interest as it essentially replaces the marginal posterior on the parameters of interest by a BIC approximation, with the unsurprising conclusion that “the prior distribution of the nuisance parameter cancels out”.

off to Australia

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , on August 22, 2016 by xi'an

south bank of the Yarra river, Melbourne, July 21, 2012Taking advantage of being in San Francisco, I flew yesterday to Australia over the Pacific, crossing for the first time the day line. The 15 hour Qantas flight to Sydney was remarkably smooth and quiet, with most passengers sleeping for most of the way, and it gave me a great opportunity to go over several papers I wanted to read and review. Over the next week or so, I will work with my friends and co-authors David Frazier and Gael Martin at Monash University (and undoubtedly enjoy the great food and wine scene!). Before flying back to Paris (alas via San Francisco rather than direct).

asymptotic properties of Approximate Bayesian Computation

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on July 26, 2016 by xi'an

Street light near the St Kilda Road bridge, Melbourne, July 21, 2012With David Frazier and Gael Martin from Monash University, and with Judith Rousseau (Paris-Dauphine), we have now completed and arXived a paper entitled Asymptotic Properties of Approximate Bayesian Computation. This paper undertakes a fairly complete study of the large sample properties of ABC under weak regularity conditions. We produce therein sufficient conditions for posterior concentration, asymptotic normality of the ABC posterior estimate, and asymptotic normality of the ABC posterior mean. Moreover, those (theoretical) results are of significant import for practitioners of ABC as they pertain to the choice of tolerance ε used within ABC for selecting parameter draws. In particular, they [the results] contradict the conventional ABC wisdom that this tolerance should always be taken as small as the computing budget allows.

Now, this paper bears some similarities with our earlier paper on the consistency of ABC, written with David and Gael. As it happens, the paper was rejected after submission and I then discussed it in an internal seminar in Paris-Dauphine, with Judith taking part in the discussion and quickly suggesting some alternative approach that is now central to the current paper. The previous version analysed Bayesian consistency of ABC under specific uniformity conditions on the summary statistics used within ABC. But conditions for consistency are now much weaker conditions than earlier, thanks to Judith’s input!

There are also similarities with Li and Fearnhead (2015). Previously discussed here. However, while similar in spirit, the results contained in the two papers strongly differ on several fronts:

  1. Li and Fearnhead (2015) considers an ABC algorithm based on kernel smoothing, whereas our interest is the original ABC accept-reject and its many derivatives
  2. our theoretical approach permits a complete study of the asymptotic properties of ABC, posterior concentration, asymptotic normality of ABC posteriors, and asymptotic normality of the ABC posterior mean, whereas Li and Fearnhead (2015) is only concerned with asymptotic normality of the ABC posterior mean estimator (and various related point estimators);
  3. the results of Li and Fearnhead (2015) are derived under very strict uniformity and continuity/differentiability conditions, which bear a strong resemblance to those conditions in Yuan and Clark (2004) and Creel et al. (2015), while the result herein do not rely on such conditions and only assume very weak regularity conditions on the summaries statistics themselves; this difference allows us to characterise the behaviour of ABC in situations not covered by the approach taken in Li and Fearnhead (2015);

automatic variational ABC

Posted in pictures, Statistics with tags , , , , , , , , , , on July 8, 2016 by xi'an

Amster11“Stochastic Variational inference is an appealing alternative to the inefficient sampling approaches commonly used in ABC.”

Moreno et al. [including Ted Meeds and Max Welling] recently arXived a paper merging variational inference and ABC. The argument for turning variational is computational speedup. The traditional (in variational inference) divergence decomposition of the log-marginal likelihood is replaced by an ABC version, parameterised in terms of intrinsic generators (i.e., generators that do not depend on cyber-parameters, like the U(0,1) or the N(0,1) generators). Or simulation code in the authors’ terms. Which leads to the automatic aspect of the approach. In the paper the derivation of the gradient is indeed automated.

“One issue is that even assuming that the ABC likelihood is an unbiased estimator of the true likelihood (which it is not), taking the log introduces a bias, so that we now have a biased estimate of the lower bound and thus biased gradients.”

I wonder how much of an issue this is, since we consider the variational lower bound. To be optimised in terms of the parameters of the variational posterior. Indeed, the endpoint of the analysis is to provide an optimal variational approximation, which remains an approximation whether or not the likelihood estimator is unbiased. A more “severe” limitation may be in the inversion constraint, since it seems to eliminate Beta or Gamma distributions. (Even though calling qbeta(runif(1),a,b) definitely is achievable… And not rejected by a Kolmogorov-Smirnov test.)

Incidentally, I discovered through the paper the existence of the Kumaraswamy distribution, which main appeal seems to be the ability to produce a closed-form quantile function, while bearing some resemblance with the Beta distribution. (Another arXival by Baltasar Trancón y Widemann studies some connections between those, but does not tell how to select the parameters to optimise the similarity.)

ISBA 2016 [#6]

Posted in Kids, Mountains, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on June 19, 2016 by xi'an

Fifth and final day of ISBA 2016, which was as full and intense as the previous ones. (Or even more if taking into account the late evening social activities pursued by most participants.) First thing in the morning, I managed to get very close to a hill top, thanks to the hints provided by Jeff Miller!, and with no further scratches from the nasty local thorn bushes. And I was back with plenty of time for a Bayesian robustness session with great talks. (Session organised by Judith Rousseau whom I crossed while running, rushing to the airport thanks to an Air France last-minute cancellation.) First talk by James Watson (on his paper with Chris Holmes on Kullback neighbourhoods on priors that Judith and I discussed recently in Statistical Science). Then as a contrapunto Peter Grünwald gave a neat geometric motivation for possible misbehaviour of Bayesian inference in non-convex misspecified environments and discussed his SafeBayes resolution that weights down the likelihood. In a sort of PAC-Bayesian way. And Erlis Ruli presented the ABC-R approach he developed with Laura Ventura and Nicola Sartori based on M-estimators and score functions. Making wonder [idly, as usual] whether cumulating different M-estimators would make a difference in the performances of the ABC algorithm.

David Dunson delivered one of the plenary lectures on high-dimensional discrete parameter estimation, including for instance categorical data. This wide-range talk covered many aspects and papers of David’s work, including a use of tensors I had neither seen nor heard of before before. With sparse modelling to resist the combinatoric explosion of contingency tables. However, and you may blame my Gallic pessimistic daemon for this remark, I have trouble to picture the meaning and relevance of a joint distribution on a space of hundreds and hundreds of dimension and similarly the ability to check the adequacy of any modelling in terms of goodness of fit. For instance, to borrow a non-military example from David’s talk, handling genetic data on ACGT sequences to infer its distribution sounds unreasonable unless most of the bases are mono-allelic. And the only way I see to test the realism of a model in this framework would be to engineer realisations of this distribution to observe the outcome, a test that seems neither feasible not desirable. Prediction based on such models may obviously operate satisfactorily without such realism requirements.

My first afternoon session (after the ISBA assembly that announced the location of ISBA 2020 in Yunnan, China!, home of Pu’ Ehr tea) was about accelerated MCMC schemes with talks by Sanvesh Srivastava on divide-and-conquer MCMC using Wasserstein barycentres, already discussed here, Minsuk Shin on a faster stochastic search variable selection which I could not understand, and Alex Beskos on the extension of Giles’ multilevel Monte Carlo to MCMC settings, which sounded worth investigating further even though I did not follow the notion all the way through. After listening to Luke Bornn explaining how to recalibrate grid data for climate science by accounting for correlation (with the fun title of `lost moments’), I rushed to my rental to [help] cook dinner for friends and… the ISBA 2016 conference was over!

ISBA 2016 [#4]

Posted in pictures, Running, Statistics, Travel with tags , , , , , , , , , , on June 17, 2016 by xi'an

As an organiser of the ABC session (along with Paul Fearnhead), I was already aware of most results behind the talks, but nonetheless got some new perspectives from the presentations. Ewan Cameron discussed a two-stage ABC where the first step is actually an indirect inference inference, which leads to a more efficient ABC step. With applications to epidemiology. Lu presented extensions of his work with Paul Fearnhead, incorporating regression correction à la Beaumont to demonstrate consistency and using defensive sampling to control importance sampling variance. (While we are working on a similar approach, I do not want to comment on the consistency part, but I missed how defensive sampling can operate in complex ABC settings, as it requires advanced knowledge on the target to be effective.) And Ted Meeds spoke about two directions for automatising ABC (as in the ABcruise), from incorporating the pseudo-random generator into the representation of the ABC target, to calling for deep learning advances. The inclusion of random generators in the transform is great, provided they can remain black boxes as otherwise they require recoding. (This differs from quasi-Monte Carlo ABC, which aims at reducing the variability due to sheer noise.) It took me a little while, but I eventually understood why Jan Haning saw this inclusion as a return to fiducial inference!

Merlise Clyde gave a wide-ranging plenary talk on (linear) model selection that looked at a large range of priors under the hat of generalised confluent hypergeometric priors over the mixing scale in Zellner’s g-prior. Some were consistent under one or both models, maybe even for misspecified models. Some parts paralleled my own talk on the foundations of Bayesian tests, no wonder since I mostly give a review before launching into a criticism of the Bayes factor. Since I think this may be a more productive perspective than trying to over-come the shortcomings of Bayes factors in weakly informative settings. Some comments at the end of Merlise’s talk were loosely connected to this view in that they called for an unitarian perspective [rather than adapting a prior to a specific inference problem] with decision-theoretic backup. Conveniently the next session was about priors and testing, obviously connected!, with Leo Knorr-Held considering g-priors for the Cox model, Kerrie Mengersen discussing priors for over-fitted mixtures and HMMs, and Dan Simpson entertaining us with his quest of a prior for a point process, eventually reaching PC priors.

Follow

Get every new post delivered to your Inbox.

Join 1,077 other followers