## ABC of simulation estimation with auxiliary statistics

Posted in Statistics, University life with tags , , , , on March 10, 2015 by xi'an

“In the ABC literature, an estimator that uses a general kernel is known as a noisy ABC estimator.”

Another arXival relating M-estimation econometrics techniques with ABC. Written by Jean-Jacques Forneron and Serena Ng from the Department of Economics at Columbia University, the paper tries to draw links between indirect inference and ABC, following the tracks of Drovandi and Pettitt [not quoted there] and proposes a reverse ABC sampler by

1. given a randomness realisation, ε, creating a one-to-one transform of the parameter θ that corresponds to a realisation of a summary statistics;
2. determine the value of the parameter θ that minimises the distance between this summary statistics and the observed summary statistics;
3. weight the above value of the parameter θ by π(θ) J(θ) where J is the Jacobian of the one-to-one transform.

I have difficulties to see why this sequence produces a weighted sample associated with the posterior. Unless perhaps when the minimum of the distance is zero, in which case this amounts to some inversion of the summary statistic (function). And even then, the role of the random bit  ε is unclear. Since there is no rejection. The inversion of the summary statistics seems hard to promote in practice since the transform of the parameter θ into a (random) summary is most likely highly complex.

“The posterior mean of θ constructed from the reverse sampler is the same as the posterior mean of θ computed under the original ABC sampler.”

The authors also state (p.16) that the estimators derived by their reverse method are the same as the original ABC approach but this only happens to hold asymptotically in the sample size. And I am not even sure of this weaker statement as the tolerance does not seem to play a role then. And also because the authors later oppose ABC to their reverse sampler as the latter produces iid draws from the posterior (p.25).

“The prior can be potentially used to further reduce bias, which is a feature of the ABC.”

As an aside, while the paper reviews extensively the literature on minimum distance estimators (called M-estimators in the statistics literature) and on ABC, the first quote is missing the meaning of noisy ABC, which consists in a randomised version of ABC where the observed summary statistic is randomised at the same level as the simulated statistics. And the last quote does not sound right either, as it should be seen as a feature of the Bayesian approach rather than of the ABC algorithm. The paper also attributes the paternity of ABC to Don Rubin’s 1984 paper, “who suggested that computational methods can be used to estimate the posterior distribution of interest even when a model is analytically intractable” (pp.7-8). This is incorrect in that Rubin uses ABC to explain the nature of the Bayesian reasoning, but does not in the least address computational issues.

## Approximate Bayesian Computation in state space models

Posted in Statistics, Travel, University life with tags , , , , , , , on October 2, 2014 by xi'an

While it took quite a while (!), with several visits by three of us to our respective antipodes, incl. my exciting trip to Melbourne and Monash University two years ago, our paper on ABC for state space models was arXived yesterday! Thanks to my coauthors, Gael Martin, Brendan McCabe, and  Worapree Maneesoonthorn,  I am very glad of this outcome and of the new perspective on ABC it produces.  For one thing, it concentrates on the selection of summary statistics from a more econometrics than usual point of view, defining asymptotic sufficiency in this context and demonstrated that both asymptotic sufficiency and Bayes consistency can be achieved when using maximum likelihood estimators of the parameters of an auxiliary model as summary statistics. In addition, the proximity to (asymptotic) sufficiency yielded by the MLE is replicated by the score vector. Using the score instead of the MLE as a summary statistics allows for huge gains in terms of speed. The method is then applied to a continuous time state space model, using as auxiliary model an augmented unscented Kalman filter. We also found in the various state space models tested therein that the ABC approach based on the marginal [likelihood] score was performing quite well, including wrt Fearnhead’s and Prangle’s (2012) approach… I like the idea of using such a generic object as the unscented Kalman filter for state space models, even when it is not a particularly accurate representation of the true model. Another appealing feature of the paper is in the connections made with indirect inference.

## statistical challenges in neuroscience

Posted in Books, pictures, Statistics, Travel with tags , , , , , , on September 4, 2014 by xi'an

Yet another workshop around! Still at Warwick, organised by Simon Barthelmé, Nicolas Chopin and Adam Johansen  on the theme of statistical aspects of neuroscience. Being nearby I attended a few lectures today but most talks are more topical than my current interest in the matter, plus workshop fatigue starts to appear!, and hence I will keep a low attendance for the rest of the week to take advantage of my visit here to make some progress in my research and in the preparation of the teaching semester. (Maybe paradoxically I attended a non-neuroscience talk by listening to Richard Wilkinson’s coverage of ABC methods, with an interesting stress on meta-models and the link with computer experiments. Given that we are currently re-revising our paper with Matt Moore and Kerrie Mengersen (and now Chris Drovandi), I find interesting to see a sort of convergence in our community towards a re-re-interpretation of ABC as producing an approximation of the distribution of the summary statistic itself, rather than of the original data, using auxiliary or indirect or pseudo-models like Gaussian processes. (Making the link with Mark Girolami’s talk this morning.)

## Bayesian indirect inference [a response]

Posted in Books, Statistics, Travel, University life with tags , , , , , , on February 18, 2014 by xi'an

This Bayesian indirect inference paper by Chris Drovandi and Tony Pettitt was discussed on the ‘Og two weeks ago and Chris sent me the following comments.

unsurprisingly, the performances of ABC comparing true data of size n with synthetic data of size m>n are not great. However, there exists another way of reducing the variance in the synthetic data, namely by repeating simulations of samples of size n and averaging the indicators for proximity, resulting in a frequency rather than a 0-1 estimator. See e.g. Del Moral et al. (2009). In this sense, increasing the computing power reduces the variability of the ABC approximation. (And I thus fail to see the full relevance of Result 1.)

Taking the average of the indicators from multiple simulations will reduce the variability of the estimated ABC likelihood but because it is only still an unbiased estimate it will not alter the target and will not improve the ABC approximation (Andrieu and Roberts 2009).  It will only have the effect of improving the mixing of MCMC ABC.  Result 1 is used to contrast ABC II and BIL as they behave quite differently as n is increased.

The authors make several assumptions of unicity that I somewhat find unclear. While assuming that the MLE for the auxiliary model is unique could make sense (Assumption 2), I do not understand the corresponding indexing of this estimator (of the auxiliary parameter) on the generating (model) parameter θ. It should only depend on the generated/simulated data x. The notion of a noisy mapping is just confusing to me.

The dependence on θ is a little confusing I agree (especially in the context of ABC II methods).  It starts to become more clear in the context of BIL.  As n goes to infinity, the effect of the simulated data is removed and then we obtain the function φ(θ) (so we need to remember which θ simulated the data), which is referred to as the mapping or binding function in the II literature.  If we somehow knew the binding function, BIL would proceed straightforwardly.  But of course we don’t in practice, so we try to estimate it via simulated data (which, for computational reasons, needs to be a finite sample) from the true model based on theta.  Thus we obtain a noisy estimate of the mapping.  One way forward might be to fit some (non-parametric?) regression model to smooth out the noise and try to recover the true mapping (without ever taking n to infinity) and run a second BIL with this estimated mapping.  I plan to investigate this in future work.

The assumption that the auxiliary score function at the auxiliary MLE for the observed data and for a simulated dataset (Assumption 3) is unique proceeds from the same spirit. I however fail to see why it matters so much. If the auxiliary MLE is the result of a numerical optimisation algorithm, the numerical algorithm may return local modes. This only adds to the approximative effect of the ABC-I schemes.

The optimiser failing to find the MLE (local mode) is certainly an issue shared by all BII methods, apart from ABC IS (which only requires 1 optimisation, so more effort to find the MLE can be applied here).  Assuming the optimiser can obtain the MLE, I think the uniqueness assumptions makes sense.  It basically says that, for a particular simulated dataset we would like a unique value for the ABC discrepancy function.

Given that the paper does not produce convergence results for those schemes, unless the auxiliary model contains the genuine model, such theoretical assumptions do not feel that necessary.

Actually, the ABC II methods will never converge to the true posterior (in general) due to lack of sufficiency.  This is even the case if the true model is a special case of the auxiliary model! (in which case BIL can converge to the true posterior)

The paper uses normal mixtures as an auxiliary model: the multimodality of this model should not be such an hindrance (and reordering is transparent, i.e. does not “reduce the flexibility of the auxiliary model”, and does not “increase the difficulty of implementation”, as stated p.16).

Thanks for your comment.  I need to think about this more as I am not an expert on mixture modelling.  The standard EM algorithm in Matlab does not apply any ordering to the parameters of the components and uses a random start.  Thus it can return any of the multiple MLEs on offer, so the ABC IP will not work here.  So from my point of view, any alternative will increase the difficulty of implementation as it means I cannot use the standard software.  Especially considering I can apply any other BII method without worrying about the non-unique MLE.

The paper concludes from a numerical study to the superiority of the Bayesian indirect inference of Gallant and McCulloch (2009). Which simply replaces the true likelihood with the maximal auxiliary model likelihood estimated from a simulated dataset. (This is somehow similar to our use of the empirical likelihood in the PNAS paper.) It is however moderated by the cautionary provision that “the auxiliary model [should] describe the data well”. As for empirical likelihood, I would suggest resorting to this Bayesian indirect inference as a benchmark, providing a quick if possibly dirty reference against which to test more elaborate ABC schemes. Or other approximations, like empirical likelihood or Wood’s synthetic likelihood.

Unfortunately the methods are not quick (apart from ABC IS when the scores are analytic), but good approximations can be obtained.  The majority of Bayesian methods that deal with intractable likelihoods do not target the true posterior (there are a couple of exceptions in special cases) and thus also suffer from some dirtiness, and BII does not escape from that.  But, if a reasonable auxiliary model can be found, then I would suggest that (at least one of the) BII methods will be competitive.

On reflection for BIL it is not necessary for the auxiliary model to fit the data, since the generative model being proposed may be mis-specified and also not fit the data well.  BIL needs an auxiliary model that mimics well the likelihood of the generative model for values of theta in non-negligible posterior regions.  For ABC II, we are simply looking for a good summarisation of the data.  Therefore it would seem useful if the auxiliary model did fit the data well.  Note this process is independent of the generative model being proposed.  Therefore the auxiliary model would be the same regardless of the chosen generative model.  Very different considerations indeed.

Inspired by a discussion with Anthony Lee, it appears that the (Bayesian version) of synthetic likelihood you mentioned is actually also a BIL method but where the auxiliary model is applied to the summary statistic likelihood rather than the full data likelihood.  The synthetic likelihood is nice from a numerical/computational point of view as the MLE of the auxiliary model is analytic.

## ABC with indirect summary statistics

Posted in Statistics, University life with tags , , , , , , , on February 3, 2014 by xi'an

After reading Drovandi’s and Pettitt’s Bayesian Indirect Inference, I checked (in the plane to Birmingham) the earlier Gleim’s and Pigorsch’s Approximate Bayesian Computation with indirect summary statistics. The setting is indeed quite similar to the above, with a description of three ways of connecting indirect inference with ABC, albeit with a different range of illustrations. This preprint states most clearly its assumption that the generating model is a particular case of the auxiliary model, which sounds anticlimactic since the auxiliary model is precisely used because the original one is mostly out of reach! This certainly was the original motivation for using indirect inference.

The part of the paper that I find the most intriguing is the argument that the indirect approach leads to sufficient summary statistics, in the sense that they “are sufficient for the parameters of the auxiliary model and (…) sufficiency carries over to the model of interest” (p.31). Looking at the details in the Appendix, I found that the argument is lacking, because the likelihood as a functional is shown to be a (sufficient) statistic, which seems both a tautology and irrelevant because this is different from the likelihood considered at the (auxiliary) MLE, which is the summary statistic used in fine.

“…we expand the square root of an innovation density h in a Hermite expansion and truncate the in finite polynomial at some integer K which, together with other tuning parameters of the SNP density, has to be determined through a model selection criterion (such as BIC). Now we take the leading term of the Hermite expansion to follow a Gaussian GARCH model.”

As in Drovandi and Pettitt, the performances of the ABC-I schemes are tested on a toy example, which is a very basic exponential iid sample with a conjugate prior. With a gamma model as auxiliary. The authors use a standard ABC based on the first two moments as their benchmark, however they do not calibrate those moments in the distance and end up with poor performances of ABC (in a setting where there is a sufficient statistic!). The best choice in this experiment appears as the solution based on the score, but the variances of the distances are not included in the comparison tables. The second implementation considered in the paper is a rather daunting continuous-time non-Gaussian Ornstein-Uhlenbeck stochastic volatility model à la Barndorf -Nielsen and Shephard (2001). The construction of the semi-nonparametric (why not semi-parametric?) auxiliary model is quite involved as well, as illustrated by the quote above. The approach provides an answer, with posterior ABC-IS distributions on all parameters of the original model, which kindles the question of the validation of this answer in terms of the original posterior. Handling simultaneously several approximation processes would help in this regard.