## bootstrap(ed) likelihood for ABC

Posted in pictures, Statistics with tags , , , , , , , , on November 6, 2015 by xi'an

This recently arXived paper by Weixuan Zhu , Juan Miguel Marín, and Fabrizio Leisen proposes an alternative to our empirical likelihood ABC paper of 2013, or BCel. Besides the mostly personal appeal for me to report on a Juan Miguel Marín working [in Madrid] on ABC topics, along my friend Jean-Michel Marin!, this paper is another entry on ABC that connects with yet another statistical perspective, namely bootstrap. The proposal, called BCbl, is based on a reference paper by Davison, Hinkley and Worton (1992) which defines a bootstrap likelihood, a notion that relies on a double-bootstrap step to produce a non-parametric estimate of the distribution of a given estimator of the parameter θ. This estimate includes a smooth curve-fitting algorithm step, for which little description is available from the current paper. The bootstrap non-parametric substitute then plays the role of the actual likelihood, with no correction for the substitution just as in our BCel. Both approaches are convergent, with Monte Carlo simulations exhibiting similar or even identical convergence speeds although [unsurprisingly!] no deep theory is available on the comparative advantage.

An important issue from my perspective is that, while the empirical likelihood approach relies on a choice of identifying constraints that strongly impact the numerical value of the likelihood approximation, the bootstrap version starts directly from a subjectively chosen estimator of θ, which may also impact the numerical value of the likelihood approximation. In some ABC settings, finding a primary estimator of θ may be a real issue or a computational burden. Except when using a preliminary ABC step as in semi-automatic ABC. This would be an interesting crash-test for the BCbl proposal! (This would not necessarily increase the computational cost by a large amount.) In addition, I am not sure the method easily extends to larger collections of summary statistics as those used in ABC, in particular because it necessarily relies on non-parametric estimates, only operating in small enough dimensions where smooth curve-fitting algorithms can be used. Critically, the paper only processes examples with a few parameters.

The comparisons between BCel and BCbl that are produced in the paper show some gain towards BCbl. Obviously, it depends on the respective calibrations of the non-parametric methods and of regular ABC, as well as on the available computing time. I find the population genetic example somewhat puzzling: The paper refers to our composite likelihood to set the moment equations. Since this is a pseudo-likelihood, I wonder how the authors do select their parameter estimates in the double-bootstrap experiment. And for the Ising model, it is not straightforward to conceive of a bootstrap algorithm on an Ising model: (a) how does one subsample pixels and (b) what are the validity guarantees for the estimation procedure.

## scaling the Gibbs posterior credible regions

Posted in Books, Statistics, University life with tags , , , , , , , on September 11, 2015 by xi'an

“The challenge in implementation of the Gibbs posterior is that it depends on an unspecified scale (or inverse temperature) parameter.”

A new paper by Nick Syring and Ryan Martin was arXived today on the same topic as the one I discussed last January. The setting is the same as with empirical likelihood, namely that the distribution of the data is not specified, while parameters of interest are defined via moments or, more generally, a minimising a loss function. A pseudo-likelihood can then be constructed as a substitute to the likelihood, in the spirit of Bissiri et al. (2013). It is called a “Gibbs posterior” distribution in this paper. So the “Gibbs” in the title has no link with the “Gibbs” in Gibbs sampler, since inference is conducted with respect to this pseudo-posterior. Somewhat logically (!), as n grows to infinity, the pseudo- posterior concentrates upon the pseudo-true value of θ minimising the expected loss, hence asymptotically resembles to the M-estimator associated with this criterion. As I pointed out in the discussion of Bissiri et al. (2013), one major hurdle when turning a loss into a log-likelihood is that it is at best defined up to a scale factor ω. The authors choose ω so that the Gibbs posterior

$\exp\{-\omega n l_n(\theta,x) \}\pi(\theta)$

is well-calibrated. Where ln is the empirical averaged loss. So the Gibbs posterior is part of the matching prior collection. In practice the authors calibrate ω by a stochastic optimisation iterative process, with bootstrap on the side to evaluate coverage. They briefly consider empirical likelihood as an alternative, on a median regression example, where they show that their “Gibbs confidence intervals (…) are clearly the best” (p.12). Apart from the relevance of being “well-calibrated”, and the asymptotic nature of the results. and the dependence on the parameterisation via the loss function, one may also question the possibility of using this approach in large dimensional cases where all of or none of the parameters are of interest.

## a war[like] week

Posted in Books, Kids, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , on April 29, 2015 by xi'an

This week in Warwick was one of the busiest ones ever as I had to juggle between two workshops, including one in Oxford, a departmental meeting, two paper revisions, two pre-vivas, and a seminar in Leeds. Not to mention a broken toe (!), a flat tire (!!), and a diner at the X. Hardly anytime for writing blog entries..! Fortunately, I managed to squeeze time for working with Kerrie Mengersen who was visiting Warwick this fortnight. Finding new directions for the (A)BCel approach we developed a few years ago with Pierre Pudlo. The workshop in Oxford was quite informal with talks from PhD students [I fear I cannot discuss here as the papers are not online yet]. And one talk by François Caron about estimating sparse networks with not exactly exchangeable priors and completely random measures. And one talk by Kerrie Mengersen on a new and in-progress approach to handling Big Data that I found quite convincing (if again cannot discuss here). The probabilistic numerics workshop was discussed in yesterday’s post and I managed to discuss it a wee bit further with the organisers at The X restaurant in Kenilworth. (As a superfluous aside, and after a second sampling this year, I concluded that the Michelin star somewhat undeserved in that the dishes at The X are not particularly imaginative or tasty, the excellent sourdough bread being the best part of the meal!) I was expecting the train ride to Leeds to be highly bucolic as it went through the sunny countryside of South Yorkshire, with newly born lambs running in the bright green fields surrounded by old stone walls…, but instead went through endless villages with their rows of brick houses. Not that I have anything against brick houses, mind! Only, I had not realised how dense this part of England was, this presumably getting back all the way to the Industrial Revolution with the Manchester-Leeds-Birmingham triangle.

My seminar in Leeds was as exciting as in Amsterdam last week and with a large audience, so I got many and only interesting questions, from the issue of turning the output (i.e., the posterior on α) into a decision rule, to making  decision in the event of a non-conclusive posterior, to links with earlier frequentist resolutions, to whether or not we were able to solve the Lindley-Jeffreys paradox (we are not!, which makes a lot of sense), to the possibility of running a subjective or a sequential version. After the seminar I enjoyed a perfect Indian dinner at Aagrah, apparently a Yorkshire institution, with the right balance between too hot and too mild, i.e., enough spices to break a good sweat but not too many to loose any sense of taste!

## ABC for copula estimation

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , on March 23, 2015 by xi'an

Clara Grazian and Brunero Liseo (di Roma) have just arXived a note on a method merging copulas, ABC, and empirical likelihood. The approach is rather hybrid and thus not completely Bayesian, but this must be seen as a consequence of an ill-posed problem. Indeed, as in many econometric models, the model there is not fully defined: the marginals of iid observations are represented as being from well-known parametric families (and are thus well-estimated by Bayesian tools), while the joint distribution remains uncertain and hence so does the associated copula. The approach in the paper is to proceed stepwise, i.e., to estimate correctly each marginal, well correctly enough to transform the data by an estimated cdf, and then only to estimate the copula or some aspect of it based on this transformed data. Like Spearman’s ρ. For which an empirical likelihood is computed and aggregated to a prior to make a BCel weight. (If this sounds unclear, each BEel evaluation is based on a random draw from the posterior samples, which transfers some uncertainty in the parameter evaluation into the copula domain. Thanks to Brunero and Clara for clarifying this point for me!)

At this stage of the note, there are two illustrations revolving around Spearman’s ρ. One on simulated data, with better performances than a nonparametric frequentist solution. And another one on a Garch (1,1) model for two financial time-series.

I am quite glad to see an application of our BCel approach in another domain although I feel a tiny bit uncertain about the degree of arbitrariness in the approach, from the estimated cdf transforms of the marginals to the choice of the moment equations identifying the parameter of interest like Spearman’s ρ. Especially if one uses a parametric copula which moments are equally well-known. While I see the practical gain in analysing each component separately, the object created by the estimated cdf transforms may have a very different correlation structure from the true cdf transforms. Maybe there exist consistency conditions on the estimated cdfs… Maybe other notions of orthogonality or independence could be brought into the picture to validate further the two-step solution…

Posted in pictures, Statistics, University life with tags , , , , , , on February 9, 2015 by xi'an

I just arXived my comments about A. Ronald Gallant’s “Reflections on the Probability Space Induced by Moment Conditions with Implications for Bayesian Inference”, capitalising on the three posts I wrote around the discussion talk I gave at the 6th French Econometrics conference last year. Nothing new there, except that I may get a response from Ron Gallant as this is submitted as a discussion of his related paper in Journal of Financial Econometrics. While my conclusion is rather negative, I find the issue of setting prior and model based on a limited amount of information of much interest, with obvious links with ABC, empirical likelihood and other approximation methods.

## reflections on the probability space induced by moment conditions with implications for Bayesian Inference [slides]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on December 4, 2014 by xi'an

Here are the slides of my incoming discussion of Ron Gallant’s paper, tomorrow.

## reflections on the probability space induced by moment conditions with implications for Bayesian Inference [discussion]

Posted in Books, Statistics, University life with tags , , , , , , on December 1, 2014 by xi'an

[Following my earlier reflections on Ron Gallant’s paper, here is a more condensed set of questions towards my discussion of next Friday.]

“If one specifies a set of moment functions collected together into a vector m(x,θ) of dimension M, regards θ as random and asserts that some transformation Z(x,θ) has distribution ψ then what is required to use this information and then possibly a prior to make valid inference?” (p.4)

The central question in the paper is whether or not given a set of moment equations

$\mathbb{E}[m(X_1,\ldots,X_n,\theta)]=0$

(where both the Xi‘s and θ are random), one can derive a likelihood function and a prior distribution compatible with those. It sounds to me like a highly complex question since it implies the integral equation

$\int_{\Theta\times\mathcal{X}^n} m(x_1,\ldots,x_n,\theta)\,\pi(\theta)f(x_1|\theta)\cdots f(x_n|\theta) \text{d}\theta\text{d}x_1\cdots\text{d}x_n=0$

must have a solution for all n’s. A related question that was also remanent with fiducial distributions is how on Earth (or Middle Earth) the concept of a random theta could arise outside Bayesian analysis. And another one is how could the equations make sense outside the existence of the pair (prior,likelihood). A question that may exhibit my ignorance of structural models. But which may also relate to the inconsistency of Zellner’s (1996) Bayesian method of moments as exposed by Geisser and Seidenfeld (1999).

For instance, the paper starts (why?) with the Fisherian example of the t distribution of

$Z(x,\theta) = \frac{\bar{x}_n-\theta}{s/\sqrt{n}}$

which is truly is a t variable when θ is fixed at the true mean value. Now, if we assume that the joint distribution of the Xi‘s and θ is such that this projection is a t variable, is there any other case than the Dirac mass on θ? For all (large enough) sample sizes n? I cannot tell and the paper does not bring [me] an answer either.

When I look at the analysis made in the abstraction part of the paper, I am puzzled by the starting point (17), where

$p(x|\theta) = \psi(Z(x,\theta))$

since the lhs and rhs operate on different spaces. In Fisher’s example, x is an n-dimensional vector, while Z is unidimensional. If I apply blindly the formula on this example, the t density does not integrate against the Lebesgue measure in the n-dimension Euclidean space… If a change of measure allows for this representation, I do not see so much appeal in using this new measure and anyway wonder in which sense this defines a likelihood function, i.e. the product of n densities of the Xi‘s conditional on θ. To me this is the central issue, which remains unsolved by the paper.