Archive for summary statistics

ABC for repulsive point processes

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on May 5, 2016 by xi'an

garden tree, Jan. 12, 2012Shinichiro Shirota and Alan Gelfand arXived a paper on the use of ABC for analysing some repulsive point processes, more exactly the Gibbs point processes, for which ABC requires a perfect sampler to operate, unless one is okay with stopping an MCMC chain before it converges, and determinantal point processes studied by Lavancier et al. (2015) [a paper I wanted to review and could not find time to!]. Detrimental point processes have an intensity function that is the determinant of a covariance kernel, hence repulsive. Simulation of a determinantal process itself is not straightforward and involves approximations. But the likelihood itself is unavailable and Lavancier et al. (2015) use approximate versions by fast Fourier transforms, which means MCMC is challenging even with those approximate steps.

“The main computational cost of our algorithm is simulation of x for each iteration of the ABC-MCMC.”

The authors propose here to use ABC instead. With an extra approximative step for simulating the determinantal process itself. Interestingly, the Gibbs point process allows for a sufficient statistic, the number of R-closed points, although I fail to see how the radius R is determined by the model, while the determinantal process does not. The summary statistics end up being a collection of frequencies within various spheres of different radii. However, these statistics are then processed by Fearnhead’s and Prangle’s proposal, namely to come up as an approximation of E[θ|y] as the natural summary. Obtained by regression over the original summaries. Another layer of complexity stems from using an ABC-MCMC approach. And including a Lasso step in the regression towards excluding less relevant radii. The paper also considers Bayesian model validation for such point processes, implementing prior predictive tests with a ranked probability score, rather than a Bayes factor.

As point processes have always been somewhat mysterious to me, I do not have any intuition about the strength of the distributional assumptions there and the relevance of picking a determinantal process against, say, a Strauss process. The model comparisons operated in the paper are not strongly supporting one repulsive model versus the others, with the authors concluding at the need for many points towards a discrimination between models. I also wonder at the possibility of including other summaries than Ripley’s K-functions, which somewhat imply a discretisation of the space, by concentric rings. Maybe using other point processes for deriving summary statistics as MLEs or Bayes estimators for those models would help. (Or maybe not.)

auxiliary likelihood-based approximate Bayesian computation in state-space models

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on May 2, 2016 by xi'an

With Gael Martin, Brendan McCabe, David T. Frazier, and Worapree Maneesoonthorn, we arXived (and submitted) a strongly revised version of our earlier paper. We begin by demonstrating that reduction to a set of sufficient statistics of reduced dimension relative to the sample size is infeasible for most state-space models, hence calling for the use of partial posteriors in such settings. Then we give conditions [like parameter identification] under which ABC methods are Bayesian consistent, when using an auxiliary model to produce summaries, either as MLEs or [more efficiently] scores. Indeed, for the order of accuracy required by the ABC perspective, scores are equivalent to MLEs but are computed much faster than MLEs. Those conditions happen to to be weaker than those found in the recent papers of Li and Fearnhead (2016) and Creel et al.  (2015).  In particular as we make no assumption about the limiting distributions of the summary statistics. We also tackle the dimensionality curse that plagues ABC techniques by numerically exhibiting the improved accuracy brought by looking at marginal rather than joint modes. That is, by matching individual parameters via the corresponding scalar score of the integrated auxiliary likelihood rather than matching on the multi-dimensional score statistics. The approach is illustrated on realistically complex models, namely a (latent) Ornstein-Ulenbeck process with a discrete time linear Gaussian approximation is adopted and a Kalman filter auxiliary likelihood. And a square root volatility process with an auxiliary likelihood associated with a Euler discretisation and the augmented unscented Kalman filter.  In our experiments, we compared our auxiliary based  technique to the two-step approach of Fearnhead and Prangle (in the Read Paper of 2012), exhibiting improvement for the examples analysed therein. Somewhat predictably, an important challenge in this approach that is common with the related techniques of indirect inference and efficient methods of moments, is the choice of a computationally efficient and accurate auxiliary model. But most of the current ABC literature discusses the role and choice of the summary statistics, which amounts to the same challenge, while missing the regularity provided by score functions of our auxiliary models.

Goodness-of-fit statistics for ABC

Posted in Books, Statistics, University life with tags , , , , , on February 1, 2016 by xi'an

“Posterior predictive checks are well-suited to Approximate Bayesian Computation”

Louisiane Lemaire and her coauthors from Grenoble have just arXived a new paper on designing a goodness-of-fit statistic from ABC outputs. The statistic is constructed from a comparison between the observed (summary) statistics and replicated summary statistics generated from the posterior predictive distribution. This is a major difference with the standard ABC distance, when the replicated summary statistics are generated from the prior predictive distribution. The core of the paper is about calibrating a posterior predictive p-value derived from this distance, since it is not properly calibrated in the frequentist sense that it is not uniformly distributed “under the null”. A point I discussed in an ‘Og entry about Andrews’ book a few years ago.

The paper opposes the average distance between ABC acceptable summary statistics and the observed realisation to the average distance between ABC posterior predictive simulations of summary statistics and the observed realisation. In the simplest case (e.g., without a post-processing of the summary statistics), the main difference between both average distances is that the summary statistics are used twice in the first version: first to select the acceptable values of the parameters and a second time for the average distance. Which makes it biased downwards. The second version is more computationally demanding, especially when deriving the associated p-value. It however produces a higher power under the alternative. Obviously depending on how the alternative is defined, since goodness-of-fit is only related to the null, i.e., to a specific model.

From a general perspective, I do not completely agree with the conclusions of the paper in that (a) this is a frequentist assessment and partakes in the shortcomings of p-values and (b) the choice of summary statistics has a huge impact on the decision about the fit since hardly varying statistics are more likely to lead to a good fit than appropriately varying ones.

weak convergence (…) in ABC

Posted in Books, Statistics, University life with tags , , , , , , on January 18, 2016 by xi'an

Samuel Soubeyrand and Eric Haon-Lasportes recently published a paper in Statistics and Probability Letters that has some common features with the ABC consistency paper we wrote a few months ago with David Frazier and Gael Martin. And to the recent Li and Fearnhead paper on the asymptotic normality of the ABC distribution. Their approach is however based on a Bernstein-von Mises [CLT] theorem for the MLE or a pseudo-MLE. They assume that the density of this estimator is asymptotically equivalent to a Normal density, in which case the true posterior conditional on the estimator is also asymptotically equivalent to a Normal density centred at the (p)MLE. Which also makes the ABC distribution normal when both the sample size grows to infinity and the tolerance decreases to zero. Which is not completely unexpected. However, in complex settings, establishing the asymptotic normality of the (p)MLE may prove a formidable or even impossible task.

intractable likelihoods (even) for Alan

Posted in Kids, pictures, Statistics with tags , , , , , , , , , , , , on November 19, 2015 by xi'an

In connection with the official launch of the Alan Turing Institute (or ATI, of which Warwick is a partner), it funded an ATI Scoping workshop yesterday a week ago in Warwick around the notion(s) of intractable likelihood(s) and how this could/should fit within the themes of the Institute [hence the scoping]. This is one among many such scoping workshops taking place at all partners, as reported on the ATI website. Workshop that was quite relaxed and great fun, if only for getting together with most people (and friends) in the UK interested in the topic. But also pointing out some new themes I had not previously though of as related to ilike. For instance, questioning the relevance of likelihood for inference and putting forward decision theory under model misspecification, connecting with privacy and ethics [hence making intractable “good”!], introducing uncertain likelihood, getting more into network models, RKHS as a natural summary statistic, swarm of solutions for consensus inference… (And thanks to Mark Girolami for this homage to the iconic LP of the Sex Pistols!, that I played maniacally all over 1978…) My own two-cents into the discussion were mostly variations of other discussions, borrowing from ABC (and ABC slides) to call for a novel approach to approximate inference:

bootstrap(ed) likelihood for ABC

Posted in pictures, Statistics with tags , , , , , , , , on November 6, 2015 by xi'an

AmstabcThis recently arXived paper by Weixuan Zhu , Juan Miguel Marín, and Fabrizio Leisen proposes an alternative to our empirical likelihood ABC paper of 2013, or BCel. Besides the mostly personal appeal for me to report on a Juan Miguel Marín working [in Madrid] on ABC topics, along my friend Jean-Michel Marin!, this paper is another entry on ABC that connects with yet another statistical perspective, namely bootstrap. The proposal, called BCbl, is based on a reference paper by Davison, Hinkley and Worton (1992) which defines a bootstrap likelihood, a notion that relies on a double-bootstrap step to produce a non-parametric estimate of the distribution of a given estimator of the parameter θ. This estimate includes a smooth curve-fitting algorithm step, for which little description is available from the current paper. The bootstrap non-parametric substitute then plays the role of the actual likelihood, with no correction for the substitution just as in our BCel. Both approaches are convergent, with Monte Carlo simulations exhibiting similar or even identical convergence speeds although [unsurprisingly!] no deep theory is available on the comparative advantage.

An important issue from my perspective is that, while the empirical likelihood approach relies on a choice of identifying constraints that strongly impact the numerical value of the likelihood approximation, the bootstrap version starts directly from a subjectively chosen estimator of θ, which may also impact the numerical value of the likelihood approximation. In some ABC settings, finding a primary estimator of θ may be a real issue or a computational burden. Except when using a preliminary ABC step as in semi-automatic ABC. This would be an interesting crash-test for the BCbl proposal! (This would not necessarily increase the computational cost by a large amount.) In addition, I am not sure the method easily extends to larger collections of summary statistics as those used in ABC, in particular because it necessarily relies on non-parametric estimates, only operating in small enough dimensions where smooth curve-fitting algorithms can be used. Critically, the paper only processes examples with a few parameters.

The comparisons between BCel and BCbl that are produced in the paper show some gain towards BCbl. Obviously, it depends on the respective calibrations of the non-parametric methods and of regular ABC, as well as on the available computing time. I find the population genetic example somewhat puzzling: The paper refers to our composite likelihood to set the moment equations. Since this is a pseudo-likelihood, I wonder how the authors do select their parameter estimates in the double-bootstrap experiment. And for the Ising model, it is not straightforward to conceive of a bootstrap algorithm on an Ising model: (a) how does one subsample pixels and (b) what are the validity guarantees for the estimation procedure.

deep learning ABC summary statistics

Posted in Books, Statistics, University life with tags , , , , , , , , on October 19, 2015 by xi'an

“The main task of this article is to construct low-dimensional and informative summary statistics for ABC methods.”

The idea in the paper “Learning Summary Statistic for ABC via Deep Neural Network”, arXived a few days ago, is to start from the raw data and build a “deep neural network” (meaning a multiple layer neural network) to provide a non-linear regression of the parameters over the data. (There is a rather militant tone to the justification of the approach, not that unusual with proponents of deep learning approaches, I must add…) Whose calibration never seems an issue. The neural construct is called to produce an estimator (function) of θ, θ(x). Which is then used as the summary statistics. Meaning, if Theorem 1 is to be taken as the proposal, that a different ABC needs to be run for every function of interest. Or, in other words, that the method is not reparameterisation invariant.

The paper claims to achieve the same optimality properties as in Fearnhead and Prangle (2012). These are however moderate optimalities in that they are obtained for the tolerance ε equal to zero. And using the exact posterior expectation as a summary statistic, instead of a non-parametric estimate.  And an infinite functional basis in Theorem 2. I thus see little added value in results like Theorem 2 and no real optimality: That the ABC distribution can be arbitrarily close to the exact posterior is not an helpful statement when implementing the method.

The first example in the paper is the posterior distribution associated with the Ising model, which enjoys a sufficient statistic of dimension one. The issue of generating pseudo-data from the Ising model is evacuated by a call to a Gibbs sampler, but remains an intrinsic problem as the convergence of the Gibbs sampler depends on the value of the parameter θ and especially its location wrt the critical point. Both ABC posteriors are shown to be quite close.

The second example is the posterior distribution associated with an MA(2) model, apparently getting into a benchmark in the ABC literature. The comparison between an ABC based on the first two autocorrelations, an ABC based on the semi-automatic solution of Fearnhead and Prangle (2012) [for which collection of summaries?], and the neural network proposal, leads to the dismissal of the semi-automatic solution and the neural net being closest to the exact posterior [with the same tolerance quantile ε for all approaches].

A discussion crucially missing from the paper—from my perspective—is an accounting for size: First, what is the computing cost of fitting and calibrating and storing a neural network for the sole purpose of constructing a summary statistic? Once the neural net is constructed, I would assume most users would see little need in pursuing the experiment any further. (This was also why we stopped at our random forest output rather than using it as a summary statistic.) Second, how do cost and performances evolve as the dimension of the parameter θ grows? I would deem necessary to understand when the method fails. As for instance in latent variable models such as HMMs. Third, how does the size of the sample impact cost and performances? In many realistic cases when ABC applies, it is not possible to use the raw data, given its size, and summary statistics are a given. For such examples, neural networks should be compared with other ABC solutions, using the same reference table.

Follow

Get every new post delivered to your Inbox.

Join 1,033 other followers