## je suis revenu de Montréal [NIPS 2015]

**A**fter the day trip to Montréal, a quick stop in Paris, and another one in London, I thought back on the probabilistic integration workshop of last week. First, I had a very good time discussing with people there, with no (apparent) adverse reaction to my talk on “estimating constants”. Second, I finally realised what Mark Berliner meant by saying that he was a Bayesian if not a statistician, in a discussion we had in the early 1990’s, in Cornell. Third, I became [moderately] more open to the highly structured spaces used in the approaches discussed by François-Xavier Briol, Arthur Gretton, Roman Garnett, and Francis Bach. The (RKHS) functional assumptions made in those approaches are allowing for higher and more precise convergence rates, with the question being what happens when the assumptions do not hold. A comment similar to the impact of a Gaussian process as the prior on the integrand in Bayesian quadrature.

François-Xavier presented the recently arXived probabilistic integration that Andrew discussed a week ago. (While I obviously have no relevant remark to make about the maths in this paper, I wonder at the difficulty and cost in sequentially selecting the states behind the quadrature. Which presumably is covered in the earlier Frank-Wolfe paper by the same team.) Another discussion with Arthur clarified a wee bit how RKHS can be perceived in practice, with a lingering question on the size of RKHS within the entire space of functions and more importantly the significant impact of the kernel representation on the resulting approximations. Anyway, those are exciting times, when considering that different branches of numerics and probability and statistics come together to improve upon existing techniques and I am once again glad I could took part in this workshop (although sorry I had to miss the ABC workshop that took place in parallel!)

December 17, 2015 at 9:42 am

I’m not sure if this really helps, but the RKHS, H, of a GP is large enough to capture the mean (prior or posterior), but not large enough to capture any of the variation (Pr[x \in H] = 0).

But for a space with zero prior mass, it “fills” the space in the sense that if B is an event with probability epsilon, Pr[H \cup B]=1.

When I visualise it, I think of it as a bit like the set of rational numbers, it’s “everywhere”, but it has too much extra regularity and structure to be able to cover everything.

December 17, 2015 at 9:14 am

After thinking a bit about it, I guess the main thing about FW-BQ is the following: if you want to actively pick informative points in this setting to maximally decrease variance, you already have to be sure what RKHS your integrand lives in. If you aren’t sure, then BQ doesn’t help whatsoever.

The recent paper applies can be applied after samples have been acquired, which is a nice result to have.