“We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations.” (p.1)
Philipp Hennig, Michael Osborne and Mark Girolami (Warwick) posted on arXiv a paper to appear in Proceedings A of the Royal
Statistical Society that relates to the probabilistic numerics workshop they organised in Warwick with Chris Oates two months ago. The paper is both a survey and a tribune about the related questions the authors find of most interest. The overall perspective is proceeding along Persi Diaconis’ call for a principled Bayesian approach to numerical problems. One interesting argument made from the start of the paper is that numerical methods can be seen as inferential rules, in that a numerical approximation of a deterministic quantity like an integral can be interpreted as an estimate, even as a Bayes estimate if a prior is used on the space of integrals. I am always uncertain about this perspective, as for instance illustrated in the post about the missing constant in Larry Wasserman’s paradox. The approximation may look formally the same as an estimate, but there is a design aspect that is almost always attached to numerical approximations and rarely analysed as such. Not mentioning the somewhat philosophical issue that the integral itself is a constant with no uncertainty (while a statistical model should always entertain the notion that a model can be mis-specified). The distinction explains why there is a zero variance importance sampling estimator, while there is no uniformly zero variance estimator in most parametric models. At a possibly deeper level, the debate that still invades the use of Bayesian inference to solve statistical problems would most likely resurface in numerics, in that the significance of a probability statement surrounding a mathematical quantity can only be epistemic and relate to the knowledge (or lack thereof) about this quantity rather than to the quantity itself.
“(…) formulating quadrature as probabilistic regression precisely captures a trade-off between prior assumptions inherent in a computation and the computational effort required in that computation to achieve a certain precision. Computational rules arising from a strongly constrained hypothesis class can perform much better than less restrictive rules if the prior assumptions are valid.” (p.7)
Another general worry [repeating myself] about setting a prior in those functional spaces is that the posterior may then mostly reflect the choice of the prior rather than the information contained in the “data”. The above quote mentions prior assumptions that seem hard to build from prior opinion about the functional of interest. And even less about the function itself. Coming back from a gathering of “objective Bayesians“, it seems equally hard to agree upon a reference prior. However, since I like the alternative notion of using decision theory in conjunction with probabilistic numerics, it seems hard to object to the use of priors, given the “invariance” of prior x loss… But I would like to understand better how it is possible to check for prior assumption (p.7) without using the data. Or maybe it does not matter so much in this setting? Unlikely, as indicated in the remarks about the bias resulting from the active design (p.13).
A last issue I find related to the exploratory side of the paper is the “big world versus small worlds” debate, namely whether we can use the Bayesian approach to solve a sequence of small problems rather than trying to solve the big problem all at once. Which forces us to model the entirety of unknowns. And almost certainly fail. (This is was the point of the Robbins-Wasserman counterexample.) Adopting a sequence of solutions may be construed as incoherent in that the prior distribution is adapted to the problem rather than encompassing all problems. Although this would not shock the proponents of reference priors.