## probabilistic numerics and uncertainty in computations

“We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations.” (p.1)

**P**hilipp Hennig, Michael Osborne and Mark Girolami (Warwick) posted on arXiv a paper to appear in *Proceedings A of the Royal Statistical Society* that relates to the probabilistic numerics workshop they organised in Warwick with Chris Oates two months ago. The paper is both a survey and a tribune about the related questions the authors find of most interest. The overall perspective is proceeding along Persi Diaconis’ call for a principled Bayesian approach to numerical problems. One interesting argument made from the start of the paper is that numerical methods can be seen as inferential rules, in that a numerical approximation of a deterministic quantity like an integral can be interpreted as an estimate, even as a Bayes estimate if a prior is used on the space of integrals. I am always uncertain about this perspective, as for instance illustrated in the post about the missing constant in Larry Wasserman’s paradox. The approximation may look formally the same as an estimate, but there is a design aspect that is almost always attached to numerical approximations and rarely analysed as such. Not mentioning the somewhat philosophical issue that the integral itself is a constant with no uncertainty (while a statistical model should always entertain the notion that a model can be mis-specified). The distinction explains why there is a zero variance importance sampling estimator, while there is no uniformly zero variance estimator in most parametric models. At a possibly deeper level, the debate that still invades the use of Bayesian inference to solve statistical problems would most likely resurface in numerics, in that the significance of a probability statement surrounding a mathematical quantity can only be epistemic and relate to the knowledge (or lack thereof) about this quantity rather than to the quantity itself.

“(…) formulating quadrature as probabilistic regression precisely captures a trade-off between prior assumptions inherent in a computation and the computational effort required in that computation to achieve a certain precision. Computational rules arising from a strongly constrained hypothesis class can perform much better than less restrictive rulesif the prior assumptions are valid.” (p.7)

Another general worry [repeating myself] about setting a prior in those functional spaces is that the posterior may then mostly reflect the choice of the prior rather than the information contained in the “data”. The above quote mentions prior assumptions that seem hard to build from prior opinion about the functional of interest. And even less about the function itself. Coming back from a gathering of “objective Bayesians“, it seems equally hard to agree upon a reference prior. However, since I like the alternative notion of using decision theory in conjunction with probabilistic numerics, it seems hard to object to the use of priors, given the “invariance” of prior x loss… But I would like to understand better how it is possible to check for prior assumption (p.7) *without using the data*. Or maybe it does not matter so much in this setting? Unlikely, as indicated in the remarks about the bias resulting from the active design (p.13).

A last issue I find related to the exploratory side of the paper is the “big world versus small worlds” debate, namely whether we can use the Bayesian approach to solve a sequence of small problems rather than trying to solve the big problem all at once. Which forces us to model the entirety of unknowns. And almost certainly fail. (This is was the point of the Robbins-Wasserman counterexample.) Adopting a sequence of solutions may be construed as incoherent in that the prior distribution is adapted to the problem rather than encompassing all problems. Although this would not shock the proponents of reference priors.

June 11, 2015 at 11:45 pm

Interesting indeed.

>integral itself is a constant with no uncertainty

Maybe that’s a difference not worth making a distinction of – recall Efron’s Bayesian calculation of the probability his acquaintance had identical as opposed to fraternal twins – a (unacceptably invasive) test would have determined that with no uncertainty (maybe funding a 100 mathematicians for ten years might get the answer analytically.)

CS Peirce thought of mathematics as experiments performed on diagrams rather than physical entities – so not a qualitatively different form of reasoning but much easier and more likely to get replicated by others (and by diagrams he did mean do include symbolic expressions.)

With these diagrammatic experiments (example below) one can become convinced that one is certain not to be wrong about what one sees. Of course we never trust ourselves, but if many others replicate the experiment, we become convinced we can’t be wrong.

This is the distinction that I think is worth making – with computations, especially simulations, an explicit sense of error remains squarely in front of us, no matter how many others redo it.

Example:

Claim: Sum of first n odd integers is n^2

Diagrammatic experiment: Represent the integer one as one square brick and the integer three as one square brick with one brick added on the left and one on the bottom. Note that these together make 2 by 2 square divided in 4 with the lower left brick pulled out. Note for any n – 1 by n -1 square adding n + 1 squares arranged in an upside down L shape – makes and n by n square. (OK works better with pictures).

Keith O’Rourke

June 12, 2015 at 10:38 pm

Opps – in case anyone reads this and gets confused.

“Note for any (n-1)-th odd number by (n-1)-th odd number square adding n-th odd number of squares arranged in an upside down L shape …”

June 10, 2015 at 9:28 am

Christian, thanks a lot for your insightful post on our paper! Two quick remarks on your worries:

Regarding the by now classic concern that “an integral is a constant with no uncertainty”, note that numerical methods are not designed to solve just one problem, but a population of them, provided by the users of the algorithm. There is no unique point-mass prior that is simultaneously correct for all of them. And of course there is uncertainty: Nobody calls a numerical method if they already know the answer.

On “checking assumptions without using data”: this is the flipside of the issue above: The computer has access to a formal description of its task, in the form of the source code defining the integrand (or differential equation, optimization objective, etc.). Many `prior assumptions’, like smoothness of the integrand, can be formally checked, in a way that is not available with physical data sources. I concede that one can argue over the point at which the computations involved in these checks turn into “using data”, but this distinction between likelihood and prior is vague elsewhere, too.

June 10, 2015 at 4:51 pm

To be true, I’m not quite sure why one tries to not use the data when doing checks (and related to that what the reservations are people have towards empirical Bayes). In the applications I deal with I often times have no sensible prior in mind whatsoever, so the data is the only thing that can give guidance.

Regarding Christians worry that the posterior may mostly reflect the prior: of course it’s possible to do model selection here to face that problem, admittedly adding a layer of complexity.

June 10, 2015 at 8:46 pm

The reason not to do data driven checks is that in this case (infinite dimensional prior, infinitely precise data) they are inconsistent. (X has blogged about the “when Bayesian inference shatters” paper before. It’s precisely about this).

An d figure 1 clearly shows the posterior uncertainty is driven more by the prior than the data.

June 10, 2015 at 3:43 am

[…] This paper falls into the exciting new subject area of “Probabilistic Numerics”; see the new position paper and an accompanying critical discussion on Xian’s Og! […]

June 10, 2015 at 1:38 am

For me, figure 1 just demonstrates that a correspondence between the mean (or mode) of a statistical model and a classical numerical scheme doesn’t lead to any correspondence between the properties of the model and the scheme.

When a probabilistic numerical scheme is proposed that automatically reproduces basic features of the basic method it mimics (like the fact that the trapezoid rule of a function that can be analytically extended in a certain way outside the range of integration converges geometically), but I will happily reconsider its usefulness.