Bayesian inference with optimal maps
Tarek El Moselhy and Youssef Marzouk just arXived a paper entitled `Bayesian inference with optimal maps‘. The idea of a map is to find a transform f that maps the prior into the posterior. This is a radically novel idea, with the core identity (involving the Jacobian of f)
setting the constraint on the transform f in terms of the prior p and the likelihood L(x|D). (Plus a non-negligible monotonicity constraint on f…) In practice, the idea however seems delicate to implement the idea, even when all terms in the above fixed point equation are known. The authors suggest using a functional basis for the construction of f, e.g., polynomials. However, the amount of approximation due to this choice seems hard to ascertain, the variance of the difference
being the stopping criterion advanced by the authors. (Imposing monotonicity on high degree polynomials is an additional difficulty.) The models illustrating the methods are nonlinear ODEs and high-dimensional PDEs whose approximation mostly escapes my intuition. Therefore, I remain intrigued by the idea if perplex about its applicability to less regular problems. (Fixed point equations like the above are also hinting at Markov solutions, even though the optimisation in f is not operating in the usual space…)
September 12, 2011 at 8:47 pm
How about thinking of this as an update by a Markovian kernel which amounts to a Dirac…. Could we relax the problem by allowing some noise around the Dirac, ie falling back on a non-degenerate Markov kernel and some kind of (Adaptive!) SMC?
September 12, 2011 at 10:21 pm
Uh?! More details maybe…?
September 12, 2011 at 7:56 pm
I’m fascinated by this approach and very curious to see how it works for more common problems! It’s particularly interesting that all of the ‘uncomputable’ things disappear.
One question: I’m not really sure what you mean by Markovian in this context. You can certainly (if everything is regular enough) use the multivariate ‘change of variables’ formula to end up with a Monge-Ampere equation (at least for regular optimal transport), which is a local, fully nonlinear, second order PDE. But I have a feeling you’re talking about something else…
Other semi-random comments:
– I suspect that there are actually some decent approximation theory results for these polynomial expansion, but the resulting bounds are probably too pessimistic for a practical stopping criterion )but still good enough to justify the procedure).
– I wonder what happens when some of the random variables are infinite dimensional. Optimal transport theory covers these cases (and the maps are often monotone iirc), but it’s always interesting to try to turn this into a numerical method. (In particular, can it bypass some of the problems that MCMC, INLA and VB have to differing extents with REALLY big problems [i.e. can we avoid determinants?])
– I really wonder how these calculations simplify for certain classes of models. It would be nice if the computations could be simplified based on known features of common models.
– Your point on monotonicity is interesting, although you can write it as the gradient of a convex function, I’m not convinced that that’s easier…
– One of my favourite things about this paper is that it seems to bridge the gap between the ‘Uncertainly Quantification’ work that’s done in applied maths / engineering / numerical analysis and something more statistical. I hope they develop this further!
Regardless of all of these things (and even if this method actually works on real problems), this is MASSIVELY COOL!
September 12, 2011 at 10:20 pm
Cool: a lot of people had the same reaction, indeed!!!
Markov: there are Markovian ways of solving fixed point equations. It is however used for real vectors. I think your resolution is what I was hinting at… Thanks!
September 13, 2011 at 11:26 am
The problem with the Monge-Ampere approach is that no one really knows a good method for solving it (things are better than they were ~10 years ago, where I’m not sure there were methods to solve it) even in two dimensions. Apparently if you add a pseudo-time variable, you can convert the transport problem into a set of ‘fluid dynamics’ equations, but in high dimensions. But again, methods for solving PDEs numerically in >10 (realistically >3) dimensions don’t really exist, even if the PDE is linear.