## probabilistic numerics

**I** attended an highly unusual workshop while in Warwick last week. Unusual for me, obviously. It was about *probabilistic numerics*, i.e., the use of probabilistic or stochastic arguments in the numerical resolution of (possibly) deterministic problems. The notion in this approach is fairly Bayesian in that it makes use to prior information or belief about the quantity of interest, e.g., a function, to construct an usually Gaussian process prior and derive both an estimator that is identical to a numerical method (e.g., Runge-Kutta or trapezoidal integration) and uncertainty or variability around this estimator. While I did not grasp much more than the classy introduction talk by Philipp Hennig, this concept sounds fairly interesting, if only because of the Bayesian connection, and I wonder if we will soon see a probability numerics section at ISBA! More seriously, placing priors on functions or functionals is a highly formal perspective (as in Bayesian non-parametrics) and it makes me wonder how much of the data (evaluation of a function at a given set of points) and how much of the prior is reflected in the output [variability]. (Obviously, one could also ask a similar question for statistical analyses!) For instance, issues of singularity arise among those stochastic process priors.

Another question that stemmed from this talk is whether or not more efficient numerical methods can derived that way, in addition to recovering the most classical ones. Somewhat, somehow, given the idealised nature of the prior, it feels like priors could be more easily compared or ranked than in classical statistical problems. Since the aim is to figure out the value of an integral or the solution to an ODE. (Or maybe not, since again almost the same could be said about estimating a normal mean.)

June 10, 2015 at 3:46 am

[…] 10/06/2015: See Xi’an’s Og for a critique of the […]

April 27, 2015 at 12:23 pm

I wish I’d been able to go to this.

– I’m not sure most probabilistic numerics is bayesian. Probably the most widely used stochastic algorithms in numerics are “sketching” methods based on subsampling a problem for dimension reduction.

– as always infinite dimensional priors are infinitely informative. (In spite of the branding triumph that is “bayesian Nonparametrics”)

– the thing that Patrick talked about is cool because it tells us something we didn’t know about the original ode solvers. And it reflects that numerical error (incl rounding error) is pretty much never additive.

– I’m not sure that describing a numerical algorithm as a bayesian procedure is useful. People are pretty good at linking assumptions to behaviors of the algorithms. Spurious distributional assumptions aren’t obviously useful. (See, for example, Chris Oates, Mark Girolami and Nicolas Chopin’s lively paper on control functionals. The “Gaussian process” regression is, to my eye, unnecessarily specific. A kernel smoother or a spline would be equally well suited and wouldn’t require workjng out consistency properties of GP regression)

April 27, 2015 at 4:32 pm

Dan you raise some good points but…

– most numerical algorithms in the areas of numerical integration/diff eq solving involve some sort of a weighted sum. This arises as a solution of a linear system of equations, so resistance to a probabilistic interpretation is futile. The Bayesian connection is immediate if one were to admit “non-informative” priors for a linear regression problem and a general family of functions that one thinks contains an approximation to the solution. In that case the algorithm simply yields the posterior mean of a Bayesian analysis. Note that most numerical algos solve the problem in a bounded(compact) domain so the Lebesgue measure would seem an obvious choice for the construction of the non-informative prior.

– There is a correspondence between splines and GP(and kernel regression) which allows one to use either as a basic probabilistic tool. I have built ODE solvers out of penalized cubic spline “fitters” https://twitter.com/ChristosArgyrop/status/589423009087098881?s=09

Obviously one needs to match the dimension of the basis expansion to the “size” of the domain of the solution, but considering the speed with which probabilistic solvers can be developed this becomes a mere technicality.

– Having a basic compendium of equivalent models e.g GPs splines mixed models is important for applied work. Whereas the GP is more likely to be useful for numerical analysts or to people developing fast numerical methods to supplement existing “expensive” computer codes, the other interpretations may be more suitable for applied scientists (e.g me😀)

– The Bayesian approach is ideally suited when these ODE/PDE problems correspond to real measurements that one want to use for practical applications. Biomarker discovery and drug development are applications that come to my mind. In that context a mixed model reinterpretation of the probabilistic model would be required. However a frequentist solution is likely never to be realized due to computational issues.

April 27, 2015 at 5:41 pm

Oh no. I don’t doubt that it’s possible to interpret standard algorithm (and non-stabdard ones!) as posterior means, I just struggle to see the utility. Without a great deal of work with calibration (see Patrick’s stuff), the posterior doesn’t give a meaningful indication of the error. We already know how to use expert information about smoothness (or more general properties) to select numerical methods. There are various powerful adaptive methods that can make very tidy use of these things, so I just don’t see the gain.

At the very least, linear methods aren’t tgat interesting to me. It would, I think, be very useful to look at priors corresponding to nonlinear approximation methods. But not to build better numerical methods, but rather to build better statistical models (and let’s face it, I’m basically just describing what already happens in, say, image analysis and inverse problems)

April 27, 2015 at 6:31 pm

For ODE’s (and I am referring mostly to things that are likely to rise in drug metabolism, bioinformatics/systems biology or biomarkers), GPs are already providing the capability to do non-linear modelling.

Unfortunately the non-linearities should be encoded in the covariance matrix, an area in which very little research is actually done. Even your garden variety linear system of ODEs with time varying coefficients would yield non-stationary matrices. And of course the priors here play a big role, as a way to guarantee that one is really solving the problem intended in a numerically robust way.

With respect to the practicality I’d say that in many of the areas I mentioned, the use of something like a GP is probably the only way out when one is trying to learn parameters of the ODE (rather than solve the ODE).

It is nice that WinBUGS (and now STAN) offer a Runge Kutta solver but one does not realise the practical limitations of embedding these black box approaches until one runs a moderate to large mixed model regression to estimate the parameters of the ODE from measurements of a function that is not available in closed form.

April 27, 2015 at 12:44 am

The vast majority of “generic” numerical algorithms can be given a statistical interpretation…. If one tries hard enough 😀

April 27, 2015 at 12:39 am

Related: http://statweb.stanford.edu/~cgates/PERSI/papers/Bayes_numerical88.pdf

April 27, 2015 at 6:34 pm

Thank you, Paulo! I wanted to link to Persi’s paper [that I remember reading with George Casella one morning, while waiting for his car to get serviced!] and then forgot to add the paragraph. Phillip also pointed out this filiation to probabilistic numerics.

April 27, 2015 at 7:11 pm

Hi Xi’an,

I totally agree with you that this unusual subject is really surprising.

This short video lecture by Persi on this subject may be a cool introduction to other Og’s readers:

http://videolectures.net/nipsworkshops2012_diaconis_bayesian_

analysis/?q=diaconis

Persi is awesome!

And, of course, my hero Henri Poincaré knew it all… in the 19th century! No Reproducing Kernel Hilbert Space theory available? No problem. Poincaré finds a way. No rigorous description of brownian motion? All right. Poincaré discovers just what he needs to sove the problems he’s interested in.

Thanks for posting on this subject.

Best,

Paulo.