## invertible flow non equilibrium sampling (InFiNE)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 21, 2021 by xi'an

With Achille Thin and a few other coauthors [and friends], we just arXived a paper on a new form of importance sampling, motivated by a recent paper of Rotskoff and Vanden-Eijnden (2019) on non-equilibrium importance sampling. The central ideas of this earlier paper are the introduction of conformal Hamiltonian dynamics, where a dissipative term is added to the ODE found in HMC, namely

$\dfrac{\text d p_t}{\text dt}=-\dfrac{\partial}{\partial q}H(q_t,p_t)-\gamma p_t=-\nabla U(q_t)-\gamma p_t$

which means that all orbits converge to fixed points that satisfy ∇U(q) = 0 as the energy eventually vanishes. And the property that, were T be a conformal Hamiltonian integrator associated with H, i.e. perserving the invariant measure, averaging over orbits of T would improve the precision of Monte Carlo unbiased estimators, while remaining unbiased. The fact that Rotskoff and Vanden-Eijnden (2019) considered only continuous time makes their proposal hard to implement without adding approximation error, while our approach is directly set in discrete-time and preserves unbiasedness. And since measure preserving transforms are too difficult to come by, a change of variable correction, as in normalising flows, allows for an arbitrary choice of T, while keeping the estimator unbiased. The use of conformal maps makes for a natural choice of T in this context.

The resulting InFiNE algorithm is an MCMC particular algorithm which can be represented as a  partially collapsed Gibbs sampler when using the right auxiliary variables. As in Andrieu, Doucet and Hollenstein (2010) and their ISIR algorithm. The algorithm can be used for estimating normalising constants, comparing favourably with AIS, sampling from complex targets, and optimising variational autoencoders and their ELBO.

I really appreciated working on this project, with links to earlier notions like multiple importance sampling à la Owen and Zhou (2000), nested sampling, non-homogeneous normalising flows, measure estimation à la Kong et al. (2002), on which I worked in a more or less distant past.

## likelihood-free and summary-free?

Posted in Books, Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , , , , on March 30, 2021 by xi'an

My friends and coauthors Chris Drovandi and David Frazier have recently arXived a paper entitled A comparison of likelihood-free methods with and without summary statistics. In which they indeed compare these two perspectives on approximate Bayesian methods like ABC and Bayesian synthetic likelihoods.

“A criticism of summary statistic based approaches is that their choice is often ad hoc and there will generally be an  inherent loss of information.”

In ABC methods, the recourse to a summary statistic is often advocated as a “necessary evil” against the greater evil of the curse of dimension, paradoxically providing a faster convergence of the ABC approximation (Fearnhead & Liu, 2018). The authors propose a somewhat generic selection of summary statistics based on [my undergrad mentors!] Gouriéroux’s and Monfort’s indirect inference, using a mixture of Gaussians as their auxiliary model. Summary-free solutions, as in our Wasserstein papers, rely on distances between distributions, hence are functional distances, that can be seen as dimension-free as well (or criticised as infinite dimensional). Chris and David consider energy distances (which sound very much like standard distances, except for averaging over all permutations), maximum mean discrepancy as in Gretton et al. (2012), Cramèr-von Mises distances, and Kullback-Leibler divergences estimated via one-nearest-neighbour formulas, for a univariate sample. I am not aware of any degree of theoretical exploration of these functional approaches towards the precise speed of convergence of the ABC approximation…

“We found that at least one of the full data approaches was competitive with or outperforms ABC with summary statistics across all examples.”

The main part of the paper, besides a survey of the existing solutions, is to compare the performances of these over a few chosen (univariate) examples, with the exact posterior as the golden standard. In the g & k model, the Pima Indian benchmark of ABC studies!, Cramèr does somewhat better. While it does much worse in an M/G/1 example (where Wasserstein does better, and similarly for a stereological extremes example of Bortot et al., 2007). An ordering inversed again for a toad movement model I had not seen before. While the usual provision applies, namely that this is a simulation study on unidimensional data and a small number of parameters, the design of the four comparison experiments is very careful, eliminating versions that are either too costly or too divergence, although this could be potentially criticised for being unrealistic (i.e., when the true posterior is unknown). The computing time is roughly the same across methods, which essentially remove the call to kernel based approximations of the likelihood. Another point of interest is that the distance methods are significantly impacted by transforms on the data, which should not be so for intrinsic distances! Demonstrating the distances are not intrinsic…