Archive for empirical likelihood

variational approximation to empirical likelihood ABC

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , on October 1, 2021 by xi'an

Sanjay Chaudhuri and his colleagues from Singapore arXived last year a paper on a novel version of empirical likelihood ABC that I hadn’t yet found time to read. This proposal connects with our own, published with Kerrie Mengersen and Pierre Pudlo in 2013 in PNAS. It is presented as an attempt at approximating the posterior distribution based on a vector of (summary) statistics, the variational approximation (or information projection) appearing in the construction of the sampling distribution of the observed summary. (Along with a weird eyed-g symbol! I checked inside the original LaTeX file and it happens to be a mathbbmtt g, that is, the typewriter version of a blackboard computer modern g…) Which writes as an entropic correction of the true posterior distribution (in Theorem 1).

“First, the true log-joint density of the observed summary, the summaries of the i.i.d. replicates and the parameter have to be estimated. Second, we need to estimate the expectation of the above log-joint density with respect to the distribution of the data generating process. Finally, the differential entropy of the data generating density needs to be estimated from the m replicates…”

The density of the observed summary is estimated by empirical likelihood, but I do not understand the reasoning behind the moment condition used in this empirical likelihood. Indeed the moment made of the difference between the observed summaries and the observed ones is zero iff the true value of the parameter is used in the simulation. I also fail to understand the connection with our SAME procedure (Doucet, Godsill & X, 2002), in that the empirical likelihood is based on a sample made of pairs (observed,generated) where the observed part is repeated m times, indeed, but not with the intent of approximating a marginal likelihood estimator… The notion of using the actual data instead of the true expectation (i.e. as a unbiased estimator) at the true parameter value is appealing as it avoids specifying the exact (or analytical) value of this expectation (as in our approach), but I am missing the justification for the extension to any parameter value. Unless one uses an ancillary statistic, which does not sound pertinent… The differential entropy is estimated by a Kozachenko-Leonenko estimator implying k-nearest neighbours.

“The proposed empirical likelihood estimates weights by matching the moments of g(X¹), , g(X) with that of
g(X), without requiring a direct relationship with the parameter. (…) the constraints used in the construction of the empirical likelihood are based on the identity in (7), which can only be satisfied when θ = θ⁰. “

Although I am feeling like missing one argument, the later part of the paper seems to comfort my impression, as quoted above. Meaning that the approximation will fare well only in the vicinity of the true parameter. Which makes it untrustworthy for model choice purposes, I believe. (The paper uses the g-and-k benchmark without exploiting Pierre Jacob’s package that allows for exact MCMC implementation.)

a new rule for adaptive importance sampling

Posted in Books, Statistics with tags , , , , , , , , , on March 5, 2019 by xi'an

Art Owen and Yi Zhou have arXived a short paper on the combination of importance sampling estimators. Which connects somehow with the talk about multiple estimators I gave at ESM last year in Helsinki. And our earlier AMIS combination. The paper however makes two important assumptions to reach optimal weighting, which is inversely proportional to the variance:

  1. the estimators are uncorrelated if dependent;
  2. the variance of the k-th estimator is of order a (negative) power of k.

The later is puzzling when considering a series of estimators, in that k appears to act as a sample size (as in AMIS), the power is usually unknown but also there is no reason for the power to be the same for all estimators. The authors propose to use ½ as the default, both because this is the standard Monte Carlo rate and because the loss in variance is then minimal, being 12% larger.

As an aside, Art Owen also wrote an invited discussion “the unreasonable effectiveness of Monte Carlo” of ” Probabilistic Integration: A Role in Statistical Computation?” by François-Xavier Briol, Chris  Oates, Mark Girolami (Warwick), Michael Osborne and Deni Sejdinovic, to appear in Statistical Science, discussion that contains a wealth of smart and enlightening remarks. Like the analogy between pseudo-random number generators [which work unreasonably well!] vs true random numbers and Bayesian numerical integration versus non-random functions. Or the role of advanced bootstrapping when assessing the variability of Monte Carlo estimates (citing a paper of his from 1992). Also pointing out at an intriguing MCMC paper by  Michael Lavine and Jim Hodges to appear in The American Statistician.

a book and three chapters on ABC

Posted in Statistics with tags , , , , , , , , , , on January 9, 2019 by xi'an

In connection with our handbook on mixtures being published, here are three chapters I contributed to from the Handbook of ABC, edited by Scott Sisson, Yanan Fan, and Mark Beaumont:

6. Likelihood-free Model Choice, by J.-M. Marin, P. Pudlo, A. Estoup and C.P. Robert

12. Approximating the Likelihood in ABC, by  C. C. Drovandi, C. Grazian, K. Mengersen and C.P. Robert

17. Application of ABC to Infer about the Genetic History of Pygmy Hunter-Gatherers Populations from Western Central Africa, by A. Estoup, P. Verdu, J.-M. Marin, C. Robert, A. Dehne-Garcia, J.-M. Cornuet and P. Pudlo

French Econometrics [discussion]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on November 30, 2018 by xi'an

This Friday, I am briefly taking part in the 10th French Econometrics Conference as a discussant of Anna Simoni’s (CREST) talk, based on a paper co-written with Sid Chib and Minchul Shin. The conference is located at the Paris School of Economics (PSE), on Paris South End, in an impressive new building. The topic of the paper is a Bayesian empirical likelihood approach to the econometrics notion of moments model. Which I discussed here during ISBA last summer since Sid spoke (twice!) there.

easy-to-use empirical likelihood ABC

Posted in Statistics, University life with tags , , , , , , , on October 23, 2018 by xi'an

A newly arXived paper from a group of researchers at NUS I wish we had discussed when I was there last month. As we wrote this empirical ABCe paper in PNAS with Kerrie Mengersen and Pierre Pudlo in 2012. Plus the SAME paper with Arnaud Doucet and Simon Godsill ten years earlier, which the authors prefer to call data cloning in continuation of the more recent Lele et al. (2007). They could actually have used my original denomination of prior feedback (1992? I remember presenting the idea at Camp Casella in Cornell that summer) as well! Actually, I am not certain invoking prior feedback is quite necessary since this is a form of simulated method of moments as well.

Now, did we really assume that some moments of the distribution were analytically available, although the likelihood was not?! Even before going through the paper, it dawned on me that these theoretical moments could have been simulated instead, since the model is a generative one: for a given parameter value, a direct Monte Carlo approximation to the exact moment can be produced and can serve as a constraint for the empirical likelihood definition. I am surprised and aggrieved that we would not think of this empirical likelihood version of a method of moments. Which is central to the current paper. In the sense that, were the parameter exact, the differences between the moments based on the actual data x⁰ and the moments based on m replicas of the simulated data x¹,x²,… have mean zero, meaning the moment constraint is immediately available. Meaning an empirical likelihood is easily constructed, replacing the actual likelihood in an MCMC scheme, albeit at a rather high computing cost. Congratulations to the authors for uncovering this possibility that we missed!

“The summary statistics in this example were judiciously chosen.”

One point in the paper on which I disagree with the authors is the argument that MCMC sampling based on an empirical likelihood can be seen as an implementation of the pseudo-marginal Metropolis-Hastings method. The major difference in my opinion is that there is no unbiasedness here (and no generic result that indicates convergence to the exact posterior as the number of simulations grows to infinity). The other point unclear to me is about the selection of summaries [or moments] for implementing the method, which seems to be based on their performances in the subsequent estimation, performances that are hard to assess properly in intractable likelihood cases. In the last example of stereological extremes (not covered in our paper), for instance, the output is compared with the parallel synthetic likelihood result.

%d bloggers like this: