Archive for Charlie Geyer

reXing the bridge

Posted in Books, pictures, Statistics with tags , , , , , , , , , on April 27, 2021 by xi'an

As I was re-reading Xiao-Li  Meng’s and Wing Hung Wong’s 1996 bridge sampling paper in Statistica Sinica, I realised they were making the link with Geyer’s (1994) mythical tech report, in the sense that the iterative construction of α functions “converges to the `reverse logistic regression’  described in Geyer (1994) for the two-density cases” (p.839). Although they also saw the later as an “iterative” application of Torrie and Valleau’s (1977) “umbrella sampling” estimator. And cited Bennett (1976) in the Journal of Computational Physics [for which Elsevier still asks for $39.95!] as the originator of the formula [check (6)]. And of the optimal solution (check (8)). Bennett (1976) also mentions that the method fares poorly when the targets do not overlap:

“When the two ensembles neither overlap nor satisfy the above smoothness condition, an accurate estimate of the free energy cannot be made without gathering additional MC data from one or more intermediate ensembles”

in which case this sequence of intermediate targets could be constructed and, who knows?!, optimised. (This may be the chain solution discussed in the conclusion of the paper.) Another optimisation not considered in enough detail is the allocation of the computing time to the two densities, maybe using a bandit strategy to avoid estimating the variance of the importance weights first.

coupling, donkeys, coins & fish meet in Paris

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , , , , , on March 22, 2021 by xi'an

approximation of Bayes Factors via mixing

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on December 21, 2020 by xi'an

A [new version of a] paper by Chenguang Dai and Jun S. Liu got my attention when it appeared on arXiv yesterday. Due to its title which reminded me of a solution to the normalising constant approximation that we proposed in the 2010 nested sampling evaluation paper we wrote with Nicolas. Recovering bridge sampling—mentioned by Dai and Liu as an alternative to their approach rather than an early version—by a type of Charlie Geyer (1990-1994) trick. (The attached slides are taken from my MCMC graduate course, with a section on the approximation of Bayesian normalising constants I first wrote for a short course at Jim Berger’s 70th anniversary conference, in San Antonio.)

A difference with the current paper is that the authors “form a mixture distribution with an adjustable mixing parameter tuned through the Wang-Landau algorithm.” While we chose it by hand to achieve sampling from both components. The weight is updated by a simple (binary) Wang-Landau version, where the partition is determined by which component is simulated, ie by the mixture indicator auxiliary variable. Towards using both components on an even basis (à la Wang-Landau) and stabilising the resulting evaluation of the normalising constant. More generally, the strategy applies to a sequence of surrogate densities, which are chosen by variational approximations in the paper.

mining gold [ABC in PNAS]

Posted in Books, Statistics with tags , , , , , , , , , , , on March 13, 2020 by xi'an

Johann Brehmer and co-authors have just published a paper in PNAS entitled “Mining gold from implicit models to improve likelihood-free inference”. (Besides the pun about mining gold, the paper also involves techniques named RASCAL and SCANDAL, respectively! For Ratio And SCore Approximate Likelihood ratio and SCore-Augmented Neural Density Approximates Likelihood.) This setup is not ABC per se in that their simulator is used both to generate training data and construct a tractable surrogate model. Exploiting Geyer’s (1994) classification trick of expressing the likelihood ratio as the optimal classification ratio when facing two equal-size samples from one density and the other.

“For all these inference strategies, the augmented data is particularly powerful for enhancing the power of simulation-based inference for small changes in the parameter θ.”

Brehmer et al. argue that “the most important novel contribution that differentiates our work from the existing methods is the observation that additional information can be extracted from the simulator, and the development of loss functions that allow us to use this “augmented” data to more efficiently learn surrogates for the likelihood function.” Rather than starting from a statistical model, they also seem to use a scientific simulator made of multiple layers of latent variables z, where

x=F⁰(u⁰,z¹,θ), z¹=G¹(u¹,z²), z²=G¹(u²,z³), …

although they also call the marginal of x, p(x|θ), an (intractable) likelihood.

“The integral of the log is not the log of the integral!”

The central notion behind the improvement is a form of Rao-Blackwellisation, exploiting the simulated z‘s. Joint score functions and joint likelihood ratios are then available. Ignoring biases, the authors demonstrate that the closest approximation to the joint likelihood ratio and the joint score function that only depends on x is the actual likelihood ratio and the actual score function, respectively. Which sounds like an older EM result, except that the roles of estimate and target quantity are somehow inverted: one is approximating the marginal with the joint, while the marginal is the “best” approximation of the joint. But in the implementation of the method, an estimate of the (observed and intractable) likelihood ratio is indeed produced towards minimising an empirical loss based on two simulated samples. Learning this estimate ê(x) then allows one to use it for the actual data. It however requires fitting a new ê(x) for each pair of parameters. Providing as well an estimator of the likelihood p(x|θ). (Hence the SCANDAL!!!) A second type of approximation of the likelihood starts from the approximate value of the likelihood p(x|θ⁰) at a fixed value θ⁰ and expands it locally as an exponential family shift, with the score t(x|θ⁰) as sufficient statistic.

I find the paper definitely interesting even though it requires the representation of the (true) likelihood as a marginalisation over multiple layers of latent variables z. And does not provide an evaluation of the error involved in the process when the model is misspecified. As a minor supplementary appeal of the paper, the use of an asymmetric Galton quincunx to illustrate an intractable array of latent variables will certainly induce me to exploit it in projects and courses!

[Disclaimer: I was not involved in the PNAS editorial process at any point!]

generalised Poisson difference autoregressive processes

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , on February 14, 2020 by xi'an

Yesterday, Giulia Carallo arXived the paper on generalised Poisson difference autoregressive processes that is a component of her Ph.D. thesis at Ca’ Foscari Universita di Venezia and to which I contributed while visiting Venezia last Spring. The stochastic process under study is integer valued as a difference of two generalised Poisson variates, made dependent by an INGARCH process that expresses the mean as a regression over past values of the process and past means. Which can be easily simulated as a difference of (correlated) Poisson variates. These two variates can in their turn be (re)defined through a thinning operator that I find most compelling, namely as a sum of Poisson variates with a number of terms being a (quasi-) Binomial variate depending on the previous value. This representation proves useful in establishing stationarity conditions on the process. Beyond establishing various properties of the process, the paper also examines how to conduct Bayesian inference in this context, with specialised Gibbs samplers in action. And comparing models on real datasets via Geyer‘s (1994) logistic approximation to Bayes factors.