a football post?!

Posted in Statistics with tags , , , , , , , , , , , , , , on June 22, 2022 by xi'an

I am not interested in football, neither as a player (a primary school trauma when I was the last being picked!) or as a fan, contrary to my dad (who was a football referee in his youth) and my kids, but Gareth Roberts (University of Warwick) and Jeff Rosenthal wrote a paper on football draws for the (FIFA) World Cup, infamously playing in Qatar by the end of the year, which Gareth presented in a Warwick seminar.

For this tournament, there are 32 teams, first playing against opponent teams supposedly drawn from a uniform distribution over all draw assignments, within 8 groups of 4 teams, with constraints like 1-2 EU teams per group, 0-1 from the other regions. As done at the moment and on TV, the tournament is filled one team at time by drawing from Pot 1, then Pot 2, then Pot 3, & Pot 4. &tc.. Applying the constraints one draw at a time, conditional on the past draws and the constraints, rather obviously creates non-uniformity! Uniformity would be achievable by rejection sampling (with a success probability of 1/540!) But this is not televisesque enough…

A debiasing solution is found by using several balls for each team in the right proportion, correcting for the sequential draws. Still impractical when requiring 10¹⁴ balls…!

The fun in their paper is that the problem can be formulated as a particle filter, estimating the right probabilities by randomising the number of balls [hidden randomness] and estimating the probability for team j to be included by a few thousands draws. With some stratified sampling on the side to minimise randomness. Removing the need for the (intractable?) distribution is thus achieved by retrospective sampling, as in pseudo-marginal MCMC. Alternatively, one could swap pairs of teams by a simplistic MCMC algorithm, with no worry about stationarity and the possibility of on-screen draws. (Jeff devised a Java applet to simulate an actual draw.) Obviously, it is still a far stretch that this proposal will be implemented for the next World Cup. If so, I will watch it!

mining gold [ABC in PNAS]

Posted in Books, Statistics with tags , , , , , , , , , , , on March 13, 2020 by xi'an

Johann Brehmer and co-authors have just published a paper in PNAS entitled “Mining gold from implicit models to improve likelihood-free inference”. (Besides the pun about mining gold, the paper also involves techniques named RASCAL and SCANDAL, respectively! For Ratio And SCore Approximate Likelihood ratio and SCore-Augmented Neural Density Approximates Likelihood.) This setup is not ABC per se in that their simulator is used both to generate training data and construct a tractable surrogate model. Exploiting Geyer’s (1994) classification trick of expressing the likelihood ratio as the optimal classification ratio when facing two equal-size samples from one density and the other.

“For all these inference strategies, the augmented data is particularly powerful for enhancing the power of simulation-based inference for small changes in the parameter θ.”

Brehmer et al. argue that “the most important novel contribution that differentiates our work from the existing methods is the observation that additional information can be extracted from the simulator, and the development of loss functions that allow us to use this “augmented” data to more efficiently learn surrogates for the likelihood function.” Rather than starting from a statistical model, they also seem to use a scientific simulator made of multiple layers of latent variables z, where

x=F⁰(u⁰,z¹,θ), z¹=G¹(u¹,z²), z²=G¹(u²,z³), …

although they also call the marginal of x, p(x|θ), an (intractable) likelihood.

“The integral of the log is not the log of the integral!”

The central notion behind the improvement is a form of Rao-Blackwellisation, exploiting the simulated z‘s. Joint score functions and joint likelihood ratios are then available. Ignoring biases, the authors demonstrate that the closest approximation to the joint likelihood ratio and the joint score function that only depends on x is the actual likelihood ratio and the actual score function, respectively. Which sounds like an older EM result, except that the roles of estimate and target quantity are somehow inverted: one is approximating the marginal with the joint, while the marginal is the “best” approximation of the joint. But in the implementation of the method, an estimate of the (observed and intractable) likelihood ratio is indeed produced towards minimising an empirical loss based on two simulated samples. Learning this estimate ê(x) then allows one to use it for the actual data. It however requires fitting a new ê(x) for each pair of parameters. Providing as well an estimator of the likelihood p(x|θ). (Hence the SCANDAL!!!) A second type of approximation of the likelihood starts from the approximate value of the likelihood p(x|θ⁰) at a fixed value θ⁰ and expands it locally as an exponential family shift, with the score t(x|θ⁰) as sufficient statistic.

I find the paper definitely interesting even though it requires the representation of the (true) likelihood as a marginalisation over multiple layers of latent variables z. And does not provide an evaluation of the error involved in the process when the model is misspecified. As a minor supplementary appeal of the paper, the use of an asymmetric Galton quincunx to illustrate an intractable array of latent variables will certainly induce me to exploit it in projects and courses!

[Disclaimer: I was not involved in the PNAS editorial process at any point!]

in Bristol for the day

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on February 28, 2020 by xi'an

I am in Bristol for the day, giving a seminar at the Department of Statistics where I had not been for quite a while (and not since the Department has moved to a beautifully renovated building). The talk is on ABC-Gibbs, whose revision is on the verge of being resubmitted. (I also hope Greta will let me board my plane tonight…)

likelihood free nested sampling

Posted in Books, Statistics with tags , , , , , , , , , , , on April 26, 2019 by xi'an

A recent paper by Mikelson and Khammash found on bioRxiv considers the (paradoxical?) mixture of nested sampling and intractable likelihood. They however cover only the case when a particle filter or another unbiased estimator of the likelihood function can be found. Unless I am missing something in the paper, this seems a very costly and convoluted approach when pseudo-marginal MCMC is available. Or the rather substantial literature on computational approaches to state-space models. Furthermore simulating under the lower likelihood constraint gets even more intricate than for standard nested sampling as the parameter space is augmented with the likelihood estimator as an extra variable. And this makes a constrained simulation the harder, to the point that the paper need resort to a Dirichlet process Gaussian mixture approximation of the constrained density. It thus sounds quite an intricate approach to the problem. (For one of the realistic examples, the authors mention a 12 hour computation on a 48 core cluster. Producing an approximation of the evidence that is not unarguably stabilised, contrary to the above.) Once again, not being completely up-to-date in sequential Monte Carlo, I may miss a difficulty in analysing such models with other methods, but the proposal seems to be highly demanding with respect to the target.

easy-to-use empirical likelihood ABC

Posted in Statistics, University life with tags , , , , , , , on October 23, 2018 by xi'an

A newly arXived paper from a group of researchers at NUS I wish we had discussed when I was there last month. As we wrote this empirical ABCe paper in PNAS with Kerrie Mengersen and Pierre Pudlo in 2012. Plus the SAME paper with Arnaud Doucet and Simon Godsill ten years earlier, which the authors prefer to call data cloning in continuation of the more recent Lele et al. (2007). They could actually have used my original denomination of prior feedback (1992? I remember presenting the idea at Camp Casella in Cornell that summer) as well! Actually, I am not certain invoking prior feedback is quite necessary since this is a form of simulated method of moments as well.

Now, did we really assume that some moments of the distribution were analytically available, although the likelihood was not?! Even before going through the paper, it dawned on me that these theoretical moments could have been simulated instead, since the model is a generative one: for a given parameter value, a direct Monte Carlo approximation to the exact moment can be produced and can serve as a constraint for the empirical likelihood definition. I am surprised and aggrieved that we would not think of this empirical likelihood version of a method of moments. Which is central to the current paper. In the sense that, were the parameter exact, the differences between the moments based on the actual data x⁰ and the moments based on m replicas of the simulated data x¹,x²,… have mean zero, meaning the moment constraint is immediately available. Meaning an empirical likelihood is easily constructed, replacing the actual likelihood in an MCMC scheme, albeit at a rather high computing cost. Congratulations to the authors for uncovering this possibility that we missed!

“The summary statistics in this example were judiciously chosen.”

One point in the paper on which I disagree with the authors is the argument that MCMC sampling based on an empirical likelihood can be seen as an implementation of the pseudo-marginal Metropolis-Hastings method. The major difference in my opinion is that there is no unbiasedness here (and no generic result that indicates convergence to the exact posterior as the number of simulations grows to infinity). The other point unclear to me is about the selection of summaries [or moments] for implementing the method, which seems to be based on their performances in the subsequent estimation, performances that are hard to assess properly in intractable likelihood cases. In the last example of stereological extremes (not covered in our paper), for instance, the output is compared with the parallel synthetic likelihood result.