Hi Xi’an,

I totally agree with you that this unusual subject is really surprising.

This short video lecture by Persi on this subject may be a cool introduction to other Og’s readers:

http://videolectures.net/nipsworkshops2012_diaconis_bayesian_

analysis/?q=diaconis

Persi is awesome!

And, of course, my hero Henri Poincaré knew it all… in the 19th century! No Reproducing Kernel Hilbert Space theory available? No problem. Poincaré finds a way. No rigorous description of brownian motion? All right. Poincaré discovers just what he needs to sove the problems he’s interested in.

Thanks for posting on this subject.

Best,

Paulo.

]]>Thank you, Paulo! I wanted to link to Persi’s paper [that I remember reading with George Casella one morning, while waiting for his car to get serviced!] and then forgot to add the paragraph. Phillip also pointed out this filiation to probabilistic numerics.

]]>For ODE’s (and I am referring mostly to things that are likely to rise in drug metabolism, bioinformatics/systems biology or biomarkers), GPs are already providing the capability to do non-linear modelling.

Unfortunately the non-linearities should be encoded in the covariance matrix, an area in which very little research is actually done. Even your garden variety linear system of ODEs with time varying coefficients would yield non-stationary matrices. And of course the priors here play a big role, as a way to guarantee that one is really solving the problem intended in a numerically robust way.

With respect to the practicality I’d say that in many of the areas I mentioned, the use of something like a GP is probably the only way out when one is trying to learn parameters of the ODE (rather than solve the ODE).

It is nice that WinBUGS (and now STAN) offer a Runge Kutta solver but one does not realise the practical limitations of embedding these black box approaches until one runs a moderate to large mixed model regression to estimate the parameters of the ODE from measurements of a function that is not available in closed form.

Oh no. I don’t doubt that it’s possible to interpret standard algorithm (and non-stabdard ones!) as posterior means, I just struggle to see the utility. Without a great deal of work with calibration (see Patrick’s stuff), the posterior doesn’t give a meaningful indication of the error. We already know how to use expert information about smoothness (or more general properties) to select numerical methods. There are various powerful adaptive methods that can make very tidy use of these things, so I just don’t see the gain.

At the very least, linear methods aren’t tgat interesting to me. It would, I think, be very useful to look at priors corresponding to nonlinear approximation methods. But not to build better numerical methods, but rather to build better statistical models (and let’s face it, I’m basically just describing what already happens in, say, image analysis and inverse problems)

]]>Dan you raise some good points but…

– most numerical algorithms in the areas of numerical integration/diff eq solving involve some sort of a weighted sum. This arises as a solution of a linear system of equations, so resistance to a probabilistic interpretation is futile. The Bayesian connection is immediate if one were to admit “non-informative” priors for a linear regression problem and a general family of functions that one thinks contains an approximation to the solution. In that case the algorithm simply yields the posterior mean of a Bayesian analysis. Note that most numerical algos solve the problem in a bounded(compact) domain so the Lebesgue measure would seem an obvious choice for the construction of the non-informative prior.

– There is a correspondence between splines and GP(and kernel regression) which allows one to use either as a basic probabilistic tool. I have built ODE solvers out of penalized cubic spline “fitters” https://twitter.com/ChristosArgyrop/status/589423009087098881?s=09

Obviously one needs to match the dimension of the basis expansion to the “size” of the domain of the solution, but considering the speed with which probabilistic solvers can be developed this becomes a mere technicality.

– Having a basic compendium of equivalent models e.g GPs splines mixed models is important for applied work. Whereas the GP is more likely to be useful for numerical analysts or to people developing fast numerical methods to supplement existing “expensive” computer codes, the other interpretations may be more suitable for applied scientists (e.g me😀)

– The Bayesian approach is ideally suited when these ODE/PDE problems correspond to real measurements that one want to use for practical applications. Biomarker discovery and drug development are applications that come to my mind. In that context a mixed model reinterpretation of the probabilistic model would be required. However a frequentist solution is likely never to be realized due to computational issues.

– I’m not sure most probabilistic numerics is bayesian. Probably the most widely used stochastic algorithms in numerics are “sketching” methods based on subsampling a problem for dimension reduction.

– as always infinite dimensional priors are infinitely informative. (In spite of the branding triumph that is “bayesian Nonparametrics”)

– the thing that Patrick talked about is cool because it tells us something we didn’t know about the original ode solvers. And it reflects that numerical error (incl rounding error) is pretty much never additive.

– I’m not sure that describing a numerical algorithm as a bayesian procedure is useful. People are pretty good at linking assumptions to behaviors of the algorithms. Spurious distributional assumptions aren’t obviously useful. (See, for example, Chris Oates, Mark Girolami and Nicolas Chopin’s lively paper on control functionals. The “Gaussian process” regression is, to my eye, unnecessarily specific. A kernel smoother or a spline would be equally well suited and wouldn’t require workjng out consistency properties of GP regression)

]]>