Archive for BIRS
At ABC’ory last week, Kyle Cranmer gave an extended talk on estimating the likelihood ratio by classification tools. Connected with a 2015 arXival. The idea is that the likelihood ratio is invariant by a transform s(.) that is monotonic with the likelihood ratio itself. It took me a few minutes (after the talk) to understand what this meant. Because it is a transform that actually depends on the parameter values in the denominator and the numerator of the ratio. For instance the ratio itself is a proper transform in the sense that the likelihood ratio based on the distribution of the likelihood ratio under both parameter values is the same as the original likelihood ratio. Or the (naïve Bayes) probability version of the likelihood ratio. Which reminds me of the invariance in Fearnhead and Prangle (2012) of the Bayes estimate given x and of the Bayes estimate given the Bayes estimate. I also feel there is a connection with Geyer’s logistic regression estimate of normalising constants mentioned several times on the ‘Og. (The paper mentions in the conclusion the connection with this problem.)
Now, back to the paper (which I read the night after the talk to get a global perspective on the approach), the ratio is of course unknown and the implementation therein is to estimate it by a classification method. Estimating thus the probability for a given x to be from one versus the other distribution. Once this estimate is produced, its distributions under both values of the parameter can be estimated by density estimation, hence an estimated likelihood ratio be produced. With better prospects since this is a one-dimensional quantity. An objection to this derivation is that it intrinsically depends on the pair of parameters θ¹ and θ² used therein. Changing to another pair requires a new ratio, new simulations, and new density estimations. When moving to a continuous collection of parameter values, in a classical setting, the likelihood ratio involves two maxima, which can be formally represented in (3.3) as a maximum over a likelihood ratio based on the estimated densities of likelihood ratios, except that each evaluation of this ratio seems to require another simulation. (Which makes the comparison with ABC more complex than presented in the paper [p.18], since ABC major computational hurdle lies in the production of the reference table and to a lesser degree of the local regression, both items that can be recycled for any new dataset.) A smoothing step is then to include the pair of parameters θ¹ and θ² as further inputs of the classifier. There still remains the computational burden of simulating enough values of s(x) towards estimating its density for every new value of θ¹ and θ². And while the projection from x to s(x) does effectively reduce the dimension of the problem to one, the method still aims at estimating with some degree of precision the density of x, so cannot escape the curse of dimensionality. The sleight of hand resides in the classification step, since it is equivalent to estimating the likelihood ratio. I thus fail to understand how and why a poor classifier can then lead to a good approximations of the likelihood ratio “obtained by calibrating s(x)” (p.16). Where calibrating means estimating the density.
And another exciting and animated [last] day of ABC’ory [and practice]! Kyle Cranmer exposed a density ratio density estimation approach I had not seen before [and will comment here soon]. Patrick Muchmore talked about unbiased estimators of Gaussian and non-Gaussian densities in elliptically contoured distributions which allows for running pseudo-MCMC than ABC. This reminded me of using the same tool [for those distributions can be expressed as mixtures of normals] in my PhD thesis, if for completely different purposes. In his talk, including a presentation of an ABC blackbox platform called ELFI, Samuel Kaski did translate statistical inference as inverse reinforcement learning: I hope this does not catch! In the afternoon, Dennis Prangle gave us the intuition behind his rare event ABC, which is not estimating rare events by ABC but rather using rare event simulation to improve ABC. [A paper I will a.s. comment here soon as well!] And Scott Sisson concluded the day and the week with his views on ABC for high dimensions.
While being obviously biased as the organiser of the workshop, I nonetheless feel it was a wonderful meeting with just the right number of participants to induce interactions and discussions during and around the talk, as well as preserve some time for pairwise interactions. Like all other workshops I contributed to in BIRS along the years
|07w5079||2007-07-01||Bioinformatics, Genetics and Stochastic Computation: Bridging the Gap|
|10w2170||2010-09-10||Hierarchical Bayesian Methods in Ecology|
|14w5125||2014-03-02||Advances in Scalable Bayesian Computation|
this is certainly a highly profitable one! For a [major] change, the next one [18w5023] will take place in Oaxaca, Mexico, and will see computational statistics meet molecular simulation. [As an aside, here are the first and last slides of Ewan Cameron’s talk, appropriately illustrating beginning and end, for both themes of his talk: epidemiology and astronomy!]
Another great day of talks and discussions at BIRS! Continuing on the themes of the workshop between delving into the further validation of those approximation techniques and the devising of ever more approximate solutions for ever more complex problems. Among the points that came clearer to me through discussion, a realisation that the synthetic likelihood perspective is not that far away from our assumptions in the consistency paper. And that a logistic version of the approach can be constructed as well. A notion I had not met before (or have forgotten I had met) is the one of early rejection ABC, which should actually be investigated more thoroughly as it should bring considerable improvement in computing time (with the caveats of calibrating the acceptance step before producing the learning sample and of characterising the output). Both Jukka Corander and Ewan Cameron reminded us of the case of models that take minutes or hours to produce one single dataset. (In his talk on some challenging applications, Jukka Corander chose to move from socks to boots!) And Jean-Michel Marin produced an illuminating if sobering experiment on the lack of proper Bayesian coverage by ABC solutions. (It appears that Ewan’s video includes a long empty moment when we went out for the traditional group photo, missing the end of his talk.)