gradient descent | Xi'an's Og

Archive for gradient descent

simulation as optimization [by kernel gradient descent]

Posted in Books, pictures, Statistics, University life with tags ABC, biking, Charles Stein, CREST, diffusions, discrepancies, Edo, Gare de Lyon, gradient descent, Hiroshige, INRIA, kernel Stein discrepancy descent, Kullback-Leibler divergence, maximum mean discrepancy, MCMC, Mokaplan, mollified discrepancy, New York city, One Hundred Famous Views of Edo, optimal transport, optimisation, Paris, simulation, SMC, Stein kernel on April 13, 2024 by xi'an

Yesterday, which proved an unseasonal bright, warm, day, I biked (with a new wheel!) to the east of Paris—in the Gare de Lyon district where I lived for three years in the 1980’s—to attend a Mokaplan seminar at INRIA Paris, where Anna Korba (CREST, to which I am also affiliated) talked about sampling through optimization of discrepancies.
This proved a most formative hour as I had not seen this perspective earlier (or possibly had forgotten about it). Except through some of the talks at the Flatiron Institute on Transport, Diffusions, and Sampling last year. Incl. Marilou Gabrié’s and Arnaud Doucet’s.
a snapshot with my BDay Hiroshigue notebook! The concept behind remains attractive to me, at least conceptually, since it consists in approximating the target distribution, known up to a constant (a setting I have always felt standard simulation techniques was not exploiting to the maximum) or through a sample (a setting less convincing since the sample from the target is already there), via a sequence of (particle approximated) distributions when using the discrepancy between the current distribution and the target or gradient thereof to move the particles. (With no randomness in the Kernel Stein Discrepancy Descent algorithm.)
Ana Korba spoke about practically running the algorithm, as well as about convexity properties and some convergence results (with mixed performances for the Stein kernel, as opposed to SVGD). I remain definitely curious about the method like the (ergodic) distribution of the endpoints, the actual gain against an MCMC sample when accounting for computing time, the improvement above the empirical distribution when using a sample from π and its ecdf as the substitute for π, and the meaning of an error estimation in this context.

“exponential convergence (of the KL) for the SVGD gradient flow does not hold whenever π has exponential tails and the derivatives of ∇ log π and k grow at most at a polynomial rate”

a versatile alternative to ABC

Posted in Books, Statistics with tags ABC, Approximate Bayesian computation, arXiv, Bayesian inference, gradient descent, inverse cdf, likelihood-free methods, penalisation, state space model, summary statistics on July 25, 2023 by xi'an

“We introduce the Fixed Landscape Inference MethOd, a new likelihood-free inference method for continuous state-space stochastic models. It applies deterministic gradient-based optimization algorithms to obtain a point estimate of the parameters, minimizing the difference between the data and some simulations according to some prescribed summary statistics. In this sense, it is analogous to Approximate Bayesian Computation (ABC). Like ABC, it can also provide an approximation of the distribution of the parameters.”

I quickly read this arXival by Monard et al. that is presented as an alternative to ABC, while outside a Bayesian setup. The central concept is that a deterministic gradient descent provides an optimal parameter value when replacing the likelihood with a distance between the observed data and simulated synthetic data indexed by the current value of the parameter (in the descent). In order to operate the descent the synthetic data is assumed to be available as a deterministic transform of the parameter value and of a vector of basic random objects, eg Uniforms. In order to make the target function differentiable, the above Uniform vector is fixed for the entire gradient descent. A puzzling aspect of the paper is that it seems to compare the (empirical) distribution of the resulting estimator with a posterior distribution, unless the comparison is with the (empirical) distribution of the Bayes estimators. The variability due to the choice of the fixed vector of basic random objects does not seem to be taken into account either, apparently. Furthermore, the method is presented as able to handle several models at once, which I find difficult to fathom as (a) the random vectors behind each model necessarily vary and (b) there is no apparent penalisation for complexity.

	xi'an on new arXiv rendering
	David Firth on new arXiv rendering
	Coin Flipping Conund… on joint fiddlin
	Art Owen on Jerome Spanier (1930-2024)
	xi'an on the flawed genius of William P…

Xi'an's Og

Archive for gradient descent

simulation as optimization [by kernel gradient descent]

a versatile alternative to ABC

blogs & links

Recent entries

Latest comments

Og\’s RSS

Xi'an's Og

Archive for gradient descent

simulation as optimization [by kernel gradient descent]

Share:

a versatile alternative to ABC

Share:

blogs & links

Recent entries

Latest comments

Og\’s RSS