the invasion of the stochastic gradients

Within the same day, I spotted three submissions to arXiv involving stochastic gradient descent, that I briefly browsed on my trip back from Wales:

Stochastic Gradient Descent as Approximate Bayesian inference, by Mandt, Hoffman, and Blei, where this technique is used as a type of variational Bayes method, where the minimum Kullback-Leibler distance to the true posterior can be achieved. Rephrasing the [scalable] MCMC algorithm of Welling and Teh (2011) as such an approximation.
Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent, by Arnak Dalalyan, which establishes a convergence of the uncorrected Langevin algorithm to the right target distribution in the sense of the Wasserstein distance. (Uncorrected in the sense that there is no Metropolis step, meaning this is a Euler approximation.) With an extension to the noisy version, when the gradient is approximated eg by subsampling. The connection with stochastic gradient descent is thus tenuous, but Arnak explains the somewhat disappointing rate of convergence as being in agreement with optimisation rates.
Stein variational adaptive importance sampling, by Jun Han and Qiang Liu, which relates to our population Monte Carlo algorithm, but as a non-parametric version, using RKHS to represent the transforms of the particles at each iteration. The sampling method follows two threads of particles, one that is used to estimate the transform by a stochastic gradient update, and another one that is used for estimation purposes as in a regular population Monte Carlo approach. Deconstructing into those threads allows for conditional independence that makes convergence easier to establish. (A problem we also hit when working on the AMIS algorithm.)

This entry was posted on May 10, 2017 at 12:17 am and is filed under Statistics with tags approximate inference, arXiv, Euler discretisation, population Monte Carlo, RKHS, scalable MCMC, stochastic gradient, stochastic gradient descent, variational Bayes methods, Wales. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “the invasion of the stochastic gradients”

Stochastic gradient descent activity noted in Xi’ an’s Og | The Intelligence of Information Says:
May 10, 2017 at 1:08 pm

[…] the invasion of the stochastic gradients […]

Reply

Xi'an's Og

the invasion of the stochastic gradients

One Response to “the invasion of the stochastic gradients”

Leave a comment Cancel reply

Xi'an's Og

the invasion of the stochastic gradients

Share:

Related

One Response to “the invasion of the stochastic gradients”

Leave a comment Cancel reply