Archive for unadjusted Langevin algorithm
far south
Posted in Books, Statistics, Travel, University life with tags Biometrika, Computo, ENS Paris-Saclay, habilitation, HDR, MCMC convergence, overdamped Langevin algorithm, PDMP, Saclay, stochastic gradient descent, stochastic optimisation, unadjusted Langevin algorithm, Université Paris-Saclay on February 23, 2022 by xi'anblack box MCMC
Posted in Books, Statistics with tags AISTATS 2017, black box, Charles Stein, control variates, importance sampling, integration by parts, MCMC, unadjusted Langevin algorithm, unbiased es on July 17, 2021 by xi'an
“…back-box methods, despite using no information of the proposal distribution, can actually give better estimation accuracy than the typical importance sampling [methods]…”
Earlier this week I was pointed out to Liu & Lee’s black box importance sampling, published in AISTATS 2017. (which I did not attend). Already found in Briol et al. (2015) and Oates, Girolami, and Chopin (2017), the method starts from Charles Stein‘s “unbiased estimator of the loss” (that was a fundamental tool in my own PhD thesis!), a variation on integration by part:
for differentiable functions f and p cancelling at the boundaries. It also holds for the kernelised extension
for all x’, where the integrand is a 1-d function of an arbitrary kernel k(x,x’) and of the score function ∇log p. This null expectation happens to be a minimum since
and hence importance weights can be obtained by minimising
in w (from the unit simplex), for a sample of iid realisations from a possibly unknown distribution with density q. Liu & Lee show that this approximation converges faster than the standard Monte Carlo speed √n, when using Hilbertian properties of the kernel through control variates. Actually, the same thing happens when using a (leave-one-out) non-parametric kernel estimate of q rather than q. At least in theory.
“…simulating n parallel MCMC chains for m steps, where the length m of the chains can be smaller than what is typically used in MCMC, because it just needs to be large enough to bring the distribution `roughly’ close to the target distribution”
A practical application of the concept is suggested in the above quote. As a corrected weight for interrupted MCMC. Or when using an unadjusted Langevin algorithm. Provided the minimisation of the objective quadratic form is fast enough, the method can thus be used as a benchmark for regular MCMC implementation.