## Bayesian Indirect Inference and the ABC of GMM

Posted in Books, Statistics, University life with tags , , , , , , , , , , on February 17, 2016 by xi'an

“The practicality of estimation of a complex model using ABC is illustrated by the fact that we have been able to perform 2000 Monte Carlo replications of estimation of this simple DSGE model, using a single 32 core computer, in less than 72 hours.” (p.15)

Earlier this week, Michael Creel and his coauthors arXived a long paper with the above title, where ABC relates to approximate Bayesian computation. In short, this paper provides deeper theoretical foundations for the local regression post-processing of Mark Beaumont and his coauthors (2002). And some natural extensions. But apparently considering one univariate transform η(θ) of interest at a time. The theoretical validation of the method is that the resulting estimators converge at speed √n under some regularity assumptions. Including the identifiability of the parameter θ in the mean of the summary statistics T, which relates to our consistency result for ABC model choice. And a CLT on an available (?) preliminary estimator of η(θ).

The paper also includes a GMM version of ABC which appeal is less clear to me as it seems to rely on a preliminary estimator of the univariate transform of interest η(θ). Which is then randomized by a normal random walk. While this sounds a wee bit like noisy ABC, it differs from this generic approach as the model is not assumed to be known, but rather available through an asymptotic Gaussian approximation. (When the preliminary estimator is available in closed form, I do not see the appeal of adding this superfluous noise. When it is unavailable, it is unclear why a normal perturbation can be produced.)

“[In] the method we study, the estimator is consistent, asymptotically normal, and asymptotically as efficient as a limited information maximum likelihood estimator. It does not require either optimization, or MCMC, or the complex evaluation of the likelihood function.” (p.3)

Overall, I have trouble relating the paper to (my?) regular ABC in that the outcome of the supported procedures is an estimator rather than a posterior distribution. Those estimators are demonstrably endowed with convergence properties, including quantile estimates that can be exploited for credible intervals, but this does not produce a posterior distribution in the classical Bayesian sense. For instance, how can one run model comparison in this framework? Furthermore, each of those inferential steps requires solving another possibly costly optimisation problem.

“Posterior quantiles can also be used to form valid confidence intervals under correct model specification.” (p.4)

Nitpicking(ly), this statement is not correct in that posterior quantiles produce valid credible intervals and only asymptotically correct confidence intervals!

“A remedy is to choose the prior π(θ) iteratively or adaptively as functions of initial estimates of θ, so that the “prior” becomes dependent on the data, which can be denoted as π(θ|T).” (p.6)

This modification of the basic ABC scheme relying on simulation from the prior π(θ) can be found in many earlier references and the iterative construction of a better fitted importance function rather closely resembles ABC-PMC. Once again nitpicking(ly), the importance weights are defined therein (p.6) as the inverse of what they should be.

## efficient approximate Bayesian inference for models with intractable likelihood

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on July 6, 2015 by xi'an

Dalhin, Villani [Mattias, not Cédric] and Schön arXived a paper this week with the above title. The type of intractable likelihood they consider is a non-linear state-space (HMM) model and the SMC-ABC they propose is based on an optimised Laplace approximation. That is, replacing the posterior distribution on the parameter θ with a normal distribution obtained by a Taylor expansion of the log-likelihood. There is no obvious solution for deriving this approximation in the case of intractable likelihood functions and the authors make use of a Bayesian optimisation technique called Gaussian process optimisation (GPO). Meaning that the Laplace approximation is the Laplace approximation of a surrogate log-posterior. GPO is a Bayesian numerical method in the spirit of the probabilistic numerics discussed on the ‘Og a few weeks ago. In the current setting, this means iterating three steps

1. derive an approximation of the log-posterior ξ at the current θ using SMC-ABC
2. construct a surrogate log-posterior by a Gaussian process using the past (ξ,θ)’s
3. determine the next value of θ

In the first step, a standard particle filter cannot be used to approximate the observed log-posterior at θ because the conditional density of observed given latent is intractable. The solution is to use ABC for the HMM model, in the spirit of many papers by Ajay Jasra and co-authors. However, I find the construction of the substitute model allowing for a particle filter very obscure… (A side effect of the heat wave?!) I can spot a noisy ABC feature in equation (7), but am at a loss as to how the reparameterisation by the transform τ is compatible with the observed-given-latent conditional being unavailable: if the pair (x,v) at time t has a closed form expression, so does (x,y), at least on principle, since y is a deterministic transform of (x,v). Another thing I do not catch is why having a particle filter available prevent the use of a pMCMC approximation.

The second step constructs a Gaussian process posterior on the log-likelihood, with Gaussian errors on the ξ’s. The Gaussian process mean is chosen as zero, while the covariance function is a Matérn function. With hyperparameters that are estimated by maximum likelihood estimators (based on the argument that the marginal likelihood is available in closed form). Turning the approach into an empirical Bayes version.

The next design point in the sequence of θ’s is the argument of the maximum of a certain acquisition function, which is chosen here as a sort of maximum regret associated with the posterior predictive associated with the Gaussian process. With possible jittering. At this stage, it reminded me of the Gaussian process approach proposed by Michael Gutmann in his NIPS poster last year.

Overall, the method is just too convoluted for me to assess its worth and efficiency without a practical implementation to… practice upon, for which I do not have time! Hence I would welcome any comment from readers having attempted such implementations. I also wonder at the lack of link with Simon Wood‘s Gaussian approximation that appeared in Nature (2010) and was well-discussed in the Read Paper of Fearnhead and Prangle (2012).

## ABC of simulation estimation with auxiliary statistics

Posted in Statistics, University life with tags , , , , on March 10, 2015 by xi'an

“In the ABC literature, an estimator that uses a general kernel is known as a noisy ABC estimator.”

Another arXival relating M-estimation econometrics techniques with ABC. Written by Jean-Jacques Forneron and Serena Ng from the Department of Economics at Columbia University, the paper tries to draw links between indirect inference and ABC, following the tracks of Drovandi and Pettitt [not quoted there] and proposes a reverse ABC sampler by

1. given a randomness realisation, ε, creating a one-to-one transform of the parameter θ that corresponds to a realisation of a summary statistics;
2. determine the value of the parameter θ that minimises the distance between this summary statistics and the observed summary statistics;
3. weight the above value of the parameter θ by π(θ) J(θ) where J is the Jacobian of the one-to-one transform.

I have difficulties to see why this sequence produces a weighted sample associated with the posterior. Unless perhaps when the minimum of the distance is zero, in which case this amounts to some inversion of the summary statistic (function). And even then, the role of the random bit  ε is unclear. Since there is no rejection. The inversion of the summary statistics seems hard to promote in practice since the transform of the parameter θ into a (random) summary is most likely highly complex.

“The posterior mean of θ constructed from the reverse sampler is the same as the posterior mean of θ computed under the original ABC sampler.”

The authors also state (p.16) that the estimators derived by their reverse method are the same as the original ABC approach but this only happens to hold asymptotically in the sample size. And I am not even sure of this weaker statement as the tolerance does not seem to play a role then. And also because the authors later oppose ABC to their reverse sampler as the latter produces iid draws from the posterior (p.25).

“The prior can be potentially used to further reduce bias, which is a feature of the ABC.”

As an aside, while the paper reviews extensively the literature on minimum distance estimators (called M-estimators in the statistics literature) and on ABC, the first quote is missing the meaning of noisy ABC, which consists in a randomised version of ABC where the observed summary statistic is randomised at the same level as the simulated statistics. And the last quote does not sound right either, as it should be seen as a feature of the Bayesian approach rather than of the ABC algorithm. The paper also attributes the paternity of ABC to Don Rubin’s 1984 paper, “who suggested that computational methods can be used to estimate the posterior distribution of interest even when a model is analytically intractable” (pp.7-8). This is incorrect in that Rubin uses ABC to explain the nature of the Bayesian reasoning, but does not in the least address computational issues.

## ABC convergence for HMMs

Posted in Statistics, University life with tags , , , , , on April 19, 2011 by xi'an

Following my previous post on Paul Fearnhead’s and Dennis Prangle’s Semi-automatic ABC, Ajay Jasra pointed me to the paper he arXived with Thomas Dean, Sumeetpal Singh and Gareth Peters twenty days ago. I read it today. It is entitled Parameter Estimation for Hidden Markov Models with Intractable Likelihoods  and it relates to Fearnhead’s and Prangle’s paper in that those authors also establish ABC consistency for the noisy ABC. The paper focus on the HMM case and the authors construct an ABC scheme such that the ABC simulated sequence remains an HMM, the conditional distribution of the observables given the latent Markov chain being modified by the ABC acceptance ball. This means that conducting maximum likelihood (or Bayesian) estimation based on the ABC sample is equivalent to exact inference under the perturbed HMM scheme. In this sense, this equivalence brings the paper close to Wilkinson’s (2008) and Fearnhead’s and Prangle’s. While this also establishes asymptotic bias for a fixed value of the tolerance ε, it also proves that an arbitrary accuracy can be attained with enough data and a small enough ε. The authors of the paper show in addition (as in Fearnhead’s and Prangle’s) that an ABC inference based on noisy observations

$\hat y_1+\epsilon z_1,\ldots,\hat y_n+\epsilon z_n$

is equivalent to a regular inference based on the original data

$\hat y_1,\ldots,\hat y_n$

hence the asymptotic consistence of noisy ABC! Furthermore, the authors show that the asymptotic variance of the ABC version is always greater than the asymptotic variance of the standard MLE, but that it decreases as ε². The ppr also contains an illustration on an HMM with α-stable observables. (Of course, the restriction to summary statistics that preserve the HMM structure is paramount for the results in the paper to apply, hence preventing the use of truly summarising statistics that would not grow in dimension with the size of the HMM series.)

In conclusion, here comes a second paper that validates [noisy] ABC without non-parametric arguments. Both those recent papers make me appreciate even further the idea of noisy ABC: at first, I liked the concept but found the randomisation it involved rather counter-intuitive from a Bayesian perspective. Now, I rather perceive it as a duplication of the randomness in the data that brings the simulated model closer to the observed model.