mean field Langevin system & neural networks
A colleague of mine in Paris Dauphine, Zhenjie Ren, recently gave a talk on recent papers of his connecting neural nets and Langevin. Estimating the parameters of the NNs by mean-field Langevin dynamics. Following from an earlier paper on the topic by Mei, Montanari & Nguyen in 2018. Here are some notes I took during the seminar, not necessarily coherent as I was a bit under the weather that day. And had no previous exposure to most notions.
Fitting a one-layer network is turned into a minimisation programme over a measure space (when using loads of data). A reformulation that makes the problem convex. Adding a regularisation by the entropy and introducing derivatives of a functional against the measure. With a necessary and sufficient condition for the solution to be unique when the functional is convex. This reformulation leads to a Fokker-Planck equation, itself related with a Langevin diffusion. Except there is a measure in the Langevin equation, which stationary version is the solution of the original regularised minimisation programme.
A second paper contains an extension to deep NN, re-expressed as a problem in a random environment. Or with a marginal constraint (one marginal distribution being constrained). With a partial derivative wrt the marginal measure. Turning into a Langevin diffusion with an extra random element. Using optimal control produces a new Hamiltonian. Eventually producing the mean-field Langevin system as backward propagation. Coefficients being computed by chain rule, equivalent to a Euler scheme for Langevin dynamics.
This approach holds consequence for GANs with discriminator as one-layer NN and generator minimised over two measures. The discriminator is the invariant measure of the mean-field Langevin dynamics. Mentioning Metropolis-Hastings GANs which seem to require one full run of an MCMC algorithm at each iteration of the mean-field Langevin.
Leave a Reply