Archive for ABC’ory

ABC in Ed’burgh

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on June 28, 2018 by xi'an

A glorious day for this new edition of the “ABC in…” workshops, in the capital City of Edinburgh! I enjoyed very much this ABC day for demonstrating ABC is still alive and kicking!, i.e., enjoying plenty of new developments and reinterpretations. With more talks and posters on the way during the main ISBA 2018 meeting. (All nine talks are available on the webpage of the conference.)

After Michael Gutmann’s tutorial on ABC, Gael Martin (Monash) presented her recent work with David Frazier, Ole Maneesoonthorn, and Brendan McCabe on ABC  for prediction. Maybe unsurprisingly, Bayesian consistency for the given summary statistics is a sufficient condition for concentration of the ABC predictor, but ABC seems to do better for the prediction problem than for parameter estimation, not losing to exact Bayesian inference, possibly because in essence the summary statistics there need not be of a large dimension to being consistent. The following talk by Guillaume Kon Kam King was also about prediction, for the specific problem of gas offer, with a latent Wright-Fisher point process in the model. He used a population ABC solution to handle this model.

Alexander Buchholz (CREST) introduced an ABC approach with quasi-Monte Carlo steps that helps in reducing the variability and hence improves the approximation in ABC. He also looked at a Negative Geometric variant of regular ABC by running a random number of proposals until reaching a given number of acceptances, which while being more costly produces more stability.

Other talks by Trevelyan McKinley, Marko Järvenpää, Matt Moores (Warwick), and Chris Drovandi (QUT) illustrated the urge of substitute models as a first step, and not solely via Gaussian processes. With for instance the new notion of a loss function to evaluate this approximation. Chris made a case in favour of synthetic vs ABC approaches, due to degradation of the performances of nonparametric density estimation with the dimension. But I remain a doubting Thomas [Bayes] on that point as high dimensions in the data or the summary statistics are not necessarily the issue, as also processed in the paper on ABC-CDE discussed on a recent post. While synthetic likelihood requires estimating a mean function and a covariance function of the parameter of the dimension of the summary statistic. Even though estimated by simulation.

Another neat feature of the day was a special session on cosmostatistics with talks by Emille Ishida and Jessica Cisewski, from explaining how ABC was starting to make an impact on cosmo- and astro-statistics, to the special example of the stellar initial mass distribution in clusters.

Call is now open for the next “ABC in”! Note that, while these workshops have been often formally sponsored by ISBA and its BayesComp section, they are not managed by a society or a board of administrators, and hence are not much contrived by a specific format. It would just be nice to keep the low fees as part of the tradition.

machine learning-based approach to likelihood-free inference

Posted in Statistics with tags , , , , , , , , , , , on March 3, 2017 by xi'an

polyptych painting within the TransCanada Pipeline Pavilion, Banff Centre, Banff, March 21, 2012At ABC’ory last week, Kyle Cranmer gave an extended talk on estimating the likelihood ratio by classification tools. Connected with a 2015 arXival. The idea is that the likelihood ratio is invariant by a transform s(.) that is monotonic with the likelihood ratio itself. It took me a few minutes (after the talk) to understand what this meant. Because it is a transform that actually depends on the parameter values in the denominator and the numerator of the ratio. For instance the ratio itself is a proper transform in the sense that the likelihood ratio based on the distribution of the likelihood ratio under both parameter values is the same as the original likelihood ratio. Or the (naïve Bayes) probability version of the likelihood ratio. Which reminds me of the invariance in Fearnhead and Prangle (2012) of the Bayes estimate given x and of the Bayes estimate given the Bayes estimate. I also feel there is a connection with Geyer’s logistic regression estimate of normalising constants mentioned several times on the ‘Og. (The paper mentions in the conclusion the connection with this problem.)

Now, back to the paper (which I read the night after the talk to get a global perspective on the approach), the ratio is of course unknown and the implementation therein is to estimate it by a classification method. Estimating thus the probability for a given x to be from one versus the other distribution. Once this estimate is produced, its distributions under both values of the parameter can be estimated by density estimation, hence an estimated likelihood ratio be produced. With better prospects since this is a one-dimensional quantity. An objection to this derivation is that it intrinsically depends on the pair of parameters θ¹ and θ² used therein. Changing to another pair requires a new ratio, new simulations, and new density estimations. When moving to a continuous collection of parameter values, in a classical setting, the likelihood ratio involves two maxima, which can be formally represented in (3.3) as a maximum over a likelihood ratio based on the estimated densities of likelihood ratios, except that each evaluation of this ratio seems to require another simulation. (Which makes the comparison with ABC more complex than presented in the paper [p.18], since ABC major computational hurdle lies in the production of the reference table and to a lesser degree of the local regression, both items that can be recycled for any new dataset.) A smoothing step is then to include the pair of parameters θ¹ and θ² as further inputs of the classifier.  There still remains the computational burden of simulating enough values of s(x) towards estimating its density for every new value of θ¹ and θ². And while the projection from x to s(x) does effectively reduce the dimension of the problem to one, the method still aims at estimating with some degree of precision the density of x, so cannot escape the curse of dimensionality. The sleight of hand resides in the classification step, since it is equivalent to estimating the likelihood ratio. I thus fail to understand how and why a poor classifier can then lead to a good approximations of the likelihood ratio “obtained by calibrating s(x)” (p.16). Where calibrating means estimating the density.

off to Banff [17w5024]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , on February 18, 2017 by xi'an

Today, I fly from Paris to Amsterdam to Calgary to attend the ABC’ory workshop (15w2214) at the Banff International Research Station (BIRS) that Luke Bornn, Jukka Corander, Gael Martin, Dennis Prangle, Richard Wilkinson and myself built. The meeting is to brainstorm about the foundations of ABC for statistical inference rather than about the computational aspects of ABC, but the schedule is quite flexible for other directions!