## transport Monte Carlo

**R**ead this recent arXival by Leo Duan (from UF in Gainesville) on transport approaches to approximate Bayesian computation, in connection with normalising flows. The author points out a “lack of flexibility in a large class of normalizing flows” to bring forward his own proposal.

“…we assume the reference (a multivariate uniform distribution) can be written as a mixture of many one-to-one transforms from the posterior”

The transportation problem is turned into defining a joint distribution on (β,θ) such that θ is marginally distributed from the posterior and β is one of an infinite collection of transforms of θ. Which sounds quite different from normalizing flows, to be sure. Reverting the order, if one manages to simulate β from its marginal the resulting θ is one of the transforms. Chosen to be a location-scale modification of β, s⊗β+m. The weights when going from θ to β are logistic transforms with Dirichlet distributed scales. All with parameters to be optimised by minimising the Kullback-Leibler distance between the reference measure on β and its inverse mixture approximation, and resorting to gradient descent. (This may sound a wee bit overwhelming as an approximation strategy and I actually had to make a large cup of strong macha to get over it, but this may be due to the heat wave occurring at the same time!) Drawing θ from this approximation is custom-made straightforward and an MCMC correction can even be added, resulting in an independent Metropolis-Hastings version since the acceptance ratio remains computable. Although this may defeat the whole purpose of the exercise by stalling the chain if the approximation is poor (hence suggesting this last step being used instead as a control.)

The paper also contains a theoretical section that studies the approximation error, going to zero as the number of terms in the mixture, K, goes to infinity. Including a Monte Carlo error in log(n)/n (and incidentally quoting a result from my former HoD at Paris 6, Paul Deheuvels). Numerical experiments show domination or equivalence with some other solutions, e.g. being much faster than HMC, the remaining $1000 question being of course the on-line evaluation of the quality of the approximation.

March 16, 2021 at 9:19 pm

Hi Xi’an, I just found out this post (my apologies for the belated response) — thanks for highlighting my work!

During revising this paper, I developed some more justification on this simple transformation of uniforms, as used in the transport Monte Carlo. Briefly speaking, one can think of the commonly used “histogram” approximation to the target posterior, where the density inside each bin is approximated by a piecewise constant — i.e., the density of a scaled-and-shifted uniform.

Therefore, this algorithm finds a coupling between a uniform distribution for \beta and an adapted histogram for \theta. Similar to the other optimal transport problems (e.g. Wasserstein, Sinkhorn distances, etc.), there are very sparse solutions, which permits us to parsimoniously parameterize this coupling just using the Dirichlet (or DP) mixture.