Archive for generalised method of moments

easy-to-use empirical likelihood ABC

Posted in Statistics, University life with tags , , , , , , , on October 23, 2018 by xi'an

A newly arXived paper from a group of researchers at NUS I wish we had discussed when I was there last month. As we wrote this empirical ABCe paper in PNAS with Kerrie Mengersen and Pierre Pudlo in 2012. Plus the SAME paper with Arnaud Doucet and Simon Godsill ten years earlier, which the authors prefer to call data cloning in continuation of the more recent Lele et al. (2007). They could actually have used my original denomination of prior feedback (1992? I remember presenting the idea at Camp Casella in Cornell that summer) as well! Actually, I am not certain invoking prior feedback is quite necessary since this is a form of simulated method of moments as well.

Now, did we really assume that some moments of the distribution were analytically available, although the likelihood was not?! Even before going through the paper, it dawned on me that these theoretical moments could have been simulated instead, since the model is a generative one: for a given parameter value, a direct Monte Carlo approximation to the exact moment can be produced and can serve as a constraint for the empirical likelihood definition. I am surprised and aggrieved that we would not think of this empirical likelihood version of a method of moments. Which is central to the current paper. In the sense that, were the parameter exact, the differences between the moments based on the actual data x⁰ and the moments based on m replicas of the simulated data x¹,x²,… have mean zero, meaning the moment constraint is immediately available. Meaning an empirical likelihood is easily constructed, replacing the actual likelihood in an MCMC scheme, albeit at a rather high computing cost. Congratulations to the authors for uncovering this possibility that we missed!

“The summary statistics in this example were judiciously chosen.”

One point in the paper on which I disagree with the authors is the argument that MCMC sampling based on an empirical likelihood can be seen as an implementation of the pseudo-marginal Metropolis-Hastings method. The major difference in my opinion is that there is no unbiasedness here (and no generic result that indicates convergence to the exact posterior as the number of simulations grows to infinity). The other point unclear to me is about the selection of summaries [or moments] for implementing the method, which seems to be based on their performances in the subsequent estimation, performances that are hard to assess properly in intractable likelihood cases. In the last example of stereological extremes (not covered in our paper), for instance, the output is compared with the parallel synthetic likelihood result.

approximate Bayesian inference under informative sampling

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , on March 30, 2018 by xi'an

In the first issue of this year Biometrika, I spotted a paper with the above title, written by Wang, Kim, and Yang, and thought it was a particular case of ABC. However, when I read it on a rare metro ride to Dauphine, thanks to my hurting knee!, I got increasingly disappointed as the contents had nothing to do with ABC. The purpose of the paper was to derive a consistent and convergent posterior distribution based on a estimator of the parameter θ that is… consistent and convergent under informative sampling. Using for instance a Normal approximation to the sampling distribution of this estimator. Or to the sampling distribution of the pseudo-score function, S(θ) [which pseudo-normality reminded me of Ron Gallant’s approximations and of my comments on them]. The paper then considers a generalisation to the case of estimating equations, U(θ), which may again enjoy a Normal asymptotic distribution. Involving an object that does not make direct Bayesian sense, namely the posterior of the parameter θ given U(θ)…. (The algorithm proposed to generate from this posterior (8) is also a mystery.) Since the approach requires consistent estimators to start with and aims at reproducing frequentist coverage properties, I am thus at a loss as to why this pseudo-Bayesian framework is adopted.

commentaries in financial econometrics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on April 27, 2016 by xi'an

My comment(arie)s on the moment approach to Bayesian inference by Ron Gallant have appeared, along with other comment(arie)s:

Invited Article
Reflections on the Probability Space Induced by Moment Conditions with
Implications for Bayesian Inference
A. Ronald Gallant . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Commentaries
Dante Amengual and Enrique Sentana .. . . . . . . . . . 248
John Geweke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253
Jae-Young Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Oliver Linton and Ruochen Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .261
Christian P. Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Christopher A. Sims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Wei Wei and Asger Lunde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . .278
Author Response
A. Ronald Gallant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .284

formula (4) in Gallant's paperWhile commenting on commentaries is formally bound to induce an infinite loop [or l∞p], I remain puzzled by the main point of the paper, which is that setting a structural distribution on a moment function Z(x,θ) plus a prior p(θ) induces a distribution on the pair (x,θ) in a possibly weaker σ-algebra. (The two distributions may actually be incompatible.) Handling this framework requires checking that a posterior exists, which sounds rather unnatural (even though we also have to check properness of the posterior). And the meaning of such a posterior remains unclear, as for instance in this assertion that (4) above is a likelihood, when it does not define a density in x but on the object inside the exponential.

“…it is typically difficult to determine whether there exists a p(x|θ) such that the implied distribution of m(x,θ) is the one stated, and if not, what damage is done thereby” J. Geweke (p.254)

Continue reading

estimating mixtures by polynomials

Posted in Books, Statistics, University life with tags , , , , , , , on April 7, 2016 by xi'an

mixture with unknown meansSida Wang, Arun Tejasvi, and Chaganty Percy Liang have just arXived a paper about using the method of moments to estimate mixtures of distributions. Method that was introduced (?) by Pearson in 1894 for a Gaussian mixture and crab data. And studied in fair details by Bruce Lindsay and his co-authors, including his book, which makes it the more surprising that Bruce’s work is not mentioned at all in the paper. In particular the 1989 Annals of Statistics paper which connects the number of components with the rank of a moment matrix in exponential family and which made a strong impression on me at the time, just when I was starting to work on mixtures. The current paper addresses more specifically the combinatoric difficulty of solving the moment equation. The solution proceeds via a relaxed convex optimisation problem involving a moment matrix, the relaxation removing the rank condition that identifies the parameters of the mixture. While I am no expert in the resolution of the associated eigenvalue problem (Algorithm 1), I wonder at (i) the existence and convergence of a solution when using empirical moments. And (ii) the impact of the choice of the moment equations, on both existence and efficiency of the moment method. It is clearly not invariant by reparameterisation, hence parameterisation matters. It is even unclear to me how many terms should be used in the resolution: if a single dimension is acceptable, determining this dimension may prove a complex issue.

Bayesian Indirect Inference and the ABC of GMM

Posted in Books, Statistics, University life with tags , , , , , , , , , , on February 17, 2016 by xi'an

“The practicality of estimation of a complex model using ABC is illustrated by the fact that we have been able to perform 2000 Monte Carlo replications of estimation of this simple DSGE model, using a single 32 core computer, in less than 72 hours.” (p.15)

Earlier this week, Michael Creel and his coauthors arXived a long paper with the above title, where ABC relates to approximate Bayesian computation. In short, this paper provides deeper theoretical foundations for the local regression post-processing of Mark Beaumont and his coauthors (2002). And some natural extensions. But apparently considering one univariate transform η(θ) of interest at a time. The theoretical validation of the method is that the resulting estimators converge at speed √n under some regularity assumptions. Including the identifiability of the parameter θ in the mean of the summary statistics T, which relates to our consistency result for ABC model choice. And a CLT on an available (?) preliminary estimator of η(θ).

The paper also includes a GMM version of ABC which appeal is less clear to me as it seems to rely on a preliminary estimator of the univariate transform of interest η(θ). Which is then randomized by a normal random walk. While this sounds a wee bit like noisy ABC, it differs from this generic approach as the model is not assumed to be known, but rather available through an asymptotic Gaussian approximation. (When the preliminary estimator is available in closed form, I do not see the appeal of adding this superfluous noise. When it is unavailable, it is unclear why a normal perturbation can be produced.)

“[In] the method we study, the estimator is consistent, asymptotically normal, and asymptotically as efficient as a limited information maximum likelihood estimator. It does not require either optimization, or MCMC, or the complex evaluation of the likelihood function.” (p.3)

Overall, I have trouble relating the paper to (my?) regular ABC in that the outcome of the supported procedures is an estimator rather than a posterior distribution. Those estimators are demonstrably endowed with convergence properties, including quantile estimates that can be exploited for credible intervals, but this does not produce a posterior distribution in the classical Bayesian sense. For instance, how can one run model comparison in this framework? Furthermore, each of those inferential steps requires solving another possibly costly optimisation problem.

“Posterior quantiles can also be used to form valid confidence intervals under correct model specification.” (p.4)

Nitpicking(ly), this statement is not correct in that posterior quantiles produce valid credible intervals and only asymptotically correct confidence intervals!

“A remedy is to choose the prior π(θ) iteratively or adaptively as functions of initial estimates of θ, so that the “prior” becomes dependent on the data, which can be denoted as π(θ|T).” (p.6)

This modification of the basic ABC scheme relying on simulation from the prior π(θ) can be found in many earlier references and the iterative construction of a better fitted importance function rather closely resembles ABC-PMC. Once again nitpicking(ly), the importance weights are defined therein (p.6) as the inverse of what they should be.