sampling, transport, and diffusions

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on November 18, 2022 by xi'an

This week, I am attending a very cool workshop at the Flatiron Institute (not in the Flatiron building!, but close enough) on Sampling, Transport, and Diffusions, organised by Bob Carpenter and Michael Albergo. It is quite exciting as I do not know most participants or their work! The Flatiron Institute is a private institute focussed on fundamental science funded by the Simons Foundation (in such working conditions universities cannot compete with!).

Eric Vanden-Eijden gave an introductory lecture on using optimal transport notion to improve sampling, with a PDE/ODE approach of continuously turning a base distribution into a target (formalised by the distribution at time one). This amounts to solving a velocity solution to an KL optimisation objective whose target value is zero. Velocity parameterised as a deep neural network density estimator. Using a score function in a reverse SDE inspired by Hyvärinnen (2005), with a surprising occurrence of Stein’s unbiased estimator, there for the same reasons of getting rid of an unknown element. In a lot of environments, simulating from the target is the goal and this can be achieved by MCMC sampling by normalising flows, learning the transform / pushforward map.

At the break, Yuling Yao made a very smart remark that testing between two models could also be seen as an optimal transport, trying to figure an optimal transform from one model to the next, rather than the bland mixture model we used in our mixtestin paper. At this point I have no idea about the practical difficulty of using / inferring the parameters of this continuum but one could start from normalising flows. Because of time continuity, one would need some driving principle.

Esteban Tabak gave another interest talk on simulating from a conditional distribution, which sounds like a no-problem when the conditional density is known but a challenge when only pairs are observed. The problem is seen as a transport problem to a barycentre obtained as a distribution independent from the conditioning z and then inverting. Constructing maps through flows. Very cool, even possibly providing an answer for causality questions.

Many of the transport talks involved normalizing flows. One by [Simons Fellow] Christopher Jazynski about adding to the Hamiltonian (in HMC) an artificial flow field  (Vaikuntanathan and Jarzynski, 2009) to make up for the Hamiltonian moving too fast for the simulation to keep track. Connected with Eric Vanden-Eijden’s talk in the end.

An interesting extension of delayed rejection for HMC by Chirag Modi, with a manageable correction à la Antonietta Mira. Johnatan Niles-Weed provided a nonparametric perspective on optimal transport following Hütter+Rigollet, 21 AoS. With forays into the Sinkhorn algorithm, mentioning Aude Genevay’s (Dauphine graduate) regularisation.

Michael Lindsey gave a great presentation on the estimation of the trace of a matrix by the Hutchinson estimator for sdp matrices using only matrix multiplication. Solution surprisingly relying on Gibbs sampling called thermal sampling.

And while it did not involve optimal transport, I gave a short (lightning) talk on our recent adaptive restore paper: although in retrospect a presentation of Wasserstein ABC could have been more suited to the audience.

robustified Hamiltonian

Posted in Books, Statistics, University life with tags , , , , , , , , , on April 1, 2022 by xi'an

In Gregynog, last week, Lionel Riou-Durant (Warwick) presented his recent work with Jure Vogrinc on Metropolis Adjusted Langevin Trajectories, which I had also heard in the Séminaire Parisien de Statistique two weeks ago. Starting with a nice exposition of Hamiltonian Monte Carlo, highlighting its drawbacks. This includes the potentially damaging impact of poorly tuning the integration time. Their proposal is to act upon the velocity in the Hamiltonian through Langevin (positive) damping, which also preserves the stationarity.  (And connects with randomised HMC.) One theoretical in the paper is that the Langevin diffusion achieves the fastest mixing rate among randomised HMCs. From a practical perspective, there exists a version of the leapfrog integrator that adapts to this setting and can be implemented as a Metropolis adjustment. (Hence the MALT connection.) An interesting feature is that the process as such is ergodic, which avoids renewal steps (and U-turns). (There are still calibration parameters to adjust, obviously.)

invertible flow non equilibrium sampling (InFiNE)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 21, 2021 by xi'an

With Achille Thin and a few other coauthors [and friends], we just arXived a paper on a new form of importance sampling, motivated by a recent paper of Rotskoff and Vanden-Eijnden (2019) on non-equilibrium importance sampling. The central ideas of this earlier paper are the introduction of conformal Hamiltonian dynamics, where a dissipative term is added to the ODE found in HMC, namely

$\dfrac{\text d p_t}{\text dt}=-\dfrac{\partial}{\partial q}H(q_t,p_t)-\gamma p_t=-\nabla U(q_t)-\gamma p_t$

which means that all orbits converge to fixed points that satisfy ∇U(q) = 0 as the energy eventually vanishes. And the property that, were T be a conformal Hamiltonian integrator associated with H, i.e. perserving the invariant measure, averaging over orbits of T would improve the precision of Monte Carlo unbiased estimators, while remaining unbiased. The fact that Rotskoff and Vanden-Eijnden (2019) considered only continuous time makes their proposal hard to implement without adding approximation error, while our approach is directly set in discrete-time and preserves unbiasedness. And since measure preserving transforms are too difficult to come by, a change of variable correction, as in normalising flows, allows for an arbitrary choice of T, while keeping the estimator unbiased. The use of conformal maps makes for a natural choice of T in this context.

The resulting InFiNE algorithm is an MCMC particular algorithm which can be represented as a  partially collapsed Gibbs sampler when using the right auxiliary variables. As in Andrieu, Doucet and Hollenstein (2010) and their ISIR algorithm. The algorithm can be used for estimating normalising constants, comparing favourably with AIS, sampling from complex targets, and optimising variational autoencoders and their ELBO.

I really appreciated working on this project, with links to earlier notions like multiple importance sampling à la Owen and Zhou (2000), nested sampling, non-homogeneous normalising flows, measure estimation à la Kong et al. (2002), on which I worked in a more or less distant past.

general perspective on the Metropolis–Hastings kernel

Posted in Books, Statistics with tags , , , , , , , , , , , , , on January 14, 2021 by xi'an

[My Bristol friends and co-authors] Christophe Andrieu, and Anthony Lee, along with Sam Livingstone arXived a massive paper on 01 January on the Metropolis-Hastings kernel.

“Our aim is to develop a framework making establishing correctness of complex Markov chain Monte Carlo kernels a purely mechanical or algebraic exercise, while making communication of ideas simpler and unambiguous by allowing a stronger focus on essential features (…) This framework can also be used to validate kernels that do not satisfy detailed balance, i.e. which are not reversible, but a modified version thereof.”

A central notion in this highly general framework is, extending Tierney (1998), to see an MCMC kernel as a triplet involving a probability measure μ (on an extended space), an involution transform φ generalising the proposal step (i.e. þ²=id), and an associated acceptance probability ð. Then μ-reversibility occurs for

$\eth(\xi)\mu(\text{d}\xi)= \eth(\phi(\xi))\mu^{\phi}(\text{d}\xi)$

with the rhs involving the push-forward measure induced by μ and φ. And furthermore there is always a choice of an acceptance probability ð ensuring for this equality to happen. Interestingly, the new framework allows for mostly seamless handling of more complex versions of MCMC such as reversible jump and parallel tempering. But also non-reversible kernels, incl. for instance delayed rejection. And HMC, incl. NUTS. And pseudo-marginal, multiple-try, PDMPs, &c., &c. it is remarkable to see such a general theory emerging a this (late?) stage of the evolution of the field (and I will need more time and attention to understand its consequences).

the surprisingly overlooked efficiency of SMC

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on December 15, 2020 by xi'an

At the Laplace demon’s seminar today (whose cool name I cannot tire of!), Nicolas Chopin gave a webinar with the above equally cool title. And the first slide debunking myths about SMC’s:

The second part of the talk is about a recent arXival Nicolas wrote with his student Hai-Dang DauI missed, about increasing the number of MCMC steps when moving the particles. Called waste-free SMC. Where only one fraction of the particles is updated, but this is enough to create a sort of independence from previous iterations of the SMC. (Hai-Dang Dau and Nicolas Chopin had to taylor their own convergence proof for this modification of the usual SMC. Producing a single-run assessment of the asymptotic variance.)

On the side, I heard about a very neat (if possibly toyish) example on estimating the number of Latin squares:

And the other item of information is that Nicolas’ and Omiros’ book, An Introduction to Sequential Monte Carlo, has now appeared! (Looking forward reading the parts I had not yet read.)