## Bayesian conjugate gradients [open for discussion]

Posted in Books, pictures, Statistics, University life with tags , , , , , on June 25, 2019 by xi'an

When fishing for an illustration for this post on Google, I came upon this Bayesian methods for hackers cover, a book about which I have no clue whatsoever (!) but that mentions probabilistic programming. Which serves as a perfect (?!) introduction to the call for discussion in Bayesian Analysis of the incoming Bayesian conjugate gradient method by Jon Cockayne, Chris Oates (formerly Warwick), Ilse Ipsen and Mark Girolami (still partially Warwick!). Since indeed the paper is about probabilistic numerics à la Mark and co-authors. Surprisingly dealing with solving the deterministic equation Ax=b by Bayesian methods. The method produces a posterior distribution on the solution x⁰, given a fixed computing effort, which makes it pertain to the anytime algorithms. It also relates to an earlier 2015 paper by Christian Hennig where the posterior is on A⁻¹ rather than x⁰ (which is quite a surprising if valid approach to the problem!) The computing effort is translated here in computations of projections of random projections of Ax, which can be made compatible with conjugate gradient steps. Interestingly, the choice of the prior on x is quite important, including setting a low or high convergence rate…  Deadline is August 04!

## European statistics in Finland [EMS17]

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 2, 2017 by xi'an

While this European meeting of statisticians had a wide range of talks and topics, I found it to be more low key than the previous one I attended in Budapest, maybe because there was hardly any talk there in applied probability. (But there were some sessions in mathematical statistics and Mark Girolami gave a great entry to differential geometry and MCMC, in the spirit of his 2010 discussion paper. Using our recent trip to Montréal as an example of geodesic!) In the Bayesian software session [organised by Aki Vetahri], Javier Gonzáles gave a very neat introduction to Bayesian optimisation: he showed how optimisation can be turned into Bayesian inference or more specifically as a Bayesian decision problem using a loss function related to the problem of interest. The point in following a Bayesian path [or probabilist numerics] is to reduce uncertainty by the medium of prior measures on functions, although resorting [as usual] to Gaussian processes whose arbitrariness I somehow dislike within the infinity of priors (aka stochastic processes) on functions! One of his strong arguments was that the approach includes the possibility for design in picking the next observation point (as done in some ABC papers of Michael Guttman and co-authors, incl. the following talk at EMS 2017) but again the devil may be in the implementation when looking at minimising an objective function… The notion of the myopia of optimisation techniques was another good point: only looking one step ahead in the future diminishes the returns of the optimisation and an alternative presented at AISTATS 2016 [that I do not remember seeing in Càdiz] goes against this myopia.

Umberto Piccini also gave a talk on exploiting synthetic likelihoods in a Bayesian fashion (in connection with the talk he gave last year at MCqMC 2016). I wondered at the use of INLA for this Gaussian representation, as well as at the impact of the parameterisation of the summary statistics. And the session organised by Jean-Michel involved Jimmy Olson, Murray Pollock (Warwick) and myself, with great talks from both other speakers, on PaRIS and PaRISian algorithms by Jimmy, and on a wide range of exact simulation methods of continuous time processes by Murray, both managing to convey the intuition behind their results and avoiding the massive mathematics at work there. By comparison, I must have been quite unclear during my talk since someone interrupted me about how Owen & Zhou (2000) justified their deterministic mixture importance sampling representation. And then left when I could not make sense of his questions [or because it was lunchtime already].

## plenary talks at JSM 2017 in Baltimore

Posted in Statistics with tags , , , , , , , , , , on May 25, 2017 by xi'an

## MCM 2017

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on February 10, 2017 by xi'an

Je reviendrai à Montréal, as the song by Robert Charlebois goes, for the MCM 2017 meeting there, on July 3-7. I was invited to give a plenary talk by the organisers of the conference . Along with

Steffen Dereich, WWU Münster, Germany
Paul Dupuis, Brown University, Providence, USA
Mark Girolami, Imperial College London, UK
Emmanuel Gobet, École Polytechnique, Palaiseau, France
Aicke Hinrichs, Johannes Kepler University, Linz, Austria
Alexander Keller, NVIDIA Research, Germany
Gunther Leobacher, Johannes Kepler University, Linz, Austria
Art B. Owen, Stanford University, USA

Note that, while special sessions are already selected, including oneon Stochastic Gradient methods for Monte Carlo and Variational Inference, organised by Victor Elvira and Ingmar Schuster (my only contribution to this session being the suggestion they organise it!), proposals for contributed talks will be selected based on one-page abstracts, to be submitted by March 1.

## Monte Carlo with determinantal processes [reply from the authors]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on September 22, 2016 by xi'an

[Rémi Bardenet and Adrien Hardy have written a reply to my comments of today on their paper, which is more readable as a post than as comments, so here it is. I appreciate the intention, as well as the perfect editing of the reply, suited for a direct posting!]

Thanks for your comments, Xian. As a foreword, a few people we met also had the intuition that DPPs would be relevant for Monte Carlo, but no result so far was backing this claim. As it turns out, we had to work hard to prove a CLT for importance-reweighted DPPs, using some deep recent results on orthogonal polynomials. We are currently working on turning this probabilistic result into practical algorithms. For instance, efficient sampling of DPPs is indeed an important open question, to which most of your comments refer. Although this question is out of the scope of our paper, note however that our results do not depend on how you sample. Efficient sampling of DPPs, along with other natural computational questions, is actually the crux of an ANR grant we just got, so hopefully in a few years we can write a more detailed answer on this blog! We now answer some of your other points.

“one has to examine the conditions for the result to operate, from the support being within the unit hypercube,”
Any compactly supported measure would do, using dilations, for instance. Note that we don’t assume the support is the whole hypercube.

“to the existence of N orthogonal polynomials wrt the dominating measure, not discussed here”
As explained in Section 2.1.2, it is enough that the reference measure charges some open set of the hypercube, which is for instance the case if it has a density with respect to the Lebesgue measure.

“to the lack of relation between the point process and the integrand,”
Actually, our method depends heavily on the target measure μ. Unlike vanilla QMC, the repulsiveness between the quadrature nodes is tailored to the integration problem.

“changing N requires a new simulation of the entire vector unless I missed the point.”
You’re absolutely right. This is a well-known open issue in probability, see the discussion on Terence Tao’s blog.

“This requires figuring out the upper bounds on the acceptance ratios, a “problem-dependent” request that may prove impossible to implement”
We agree that in general this isn’t trivial. However, good bounds are available for all Jacobi polynomials, see Section 3.

“Even without this stumbling block, generating the N-sized sample for dimension d=N (why d=N, I wonder?)”
This is a misunderstanding: we do not say that d=N in any sense. We only say that sampling from a DPP using the algorithm of [Hough et al] requires the same number of operations as orthonormalizing N vectors of dimension N, hence the cubic cost.

1. “how does it relate to quasi-Monte Carlo?”
So far, the connection to QMC is only intuitive: both rely on well-spaced nodes, but using different mathematical tools.

2. “the marginals of the N-th order determinantal process are far from uniform (see Fig. 1), and seemingly concentrated on the boundaries”
This phenomenon is due to orthogonal polynomials. We are investigating more general constructions that give more flexibility.

3. “Is the variance of the resulting estimator (2.11) always finite?”
Yes. For instance, this follows from the inequality below (5.56) since ƒ(x)/K(x,x) is Lipschitz.

4. and 5. We are investigating concentration inequalities to answer these points.

6. “probabilistic numerics produce an epistemic assessment of uncertainty, contrary to the current proposal.”
A partial answer may be our Remark 2.12. You can interpret DPPs as putting a Gaussian process prior over ƒ and sequentially sampling from the posterior variance of the GP.

## Monte Carlo with determinantal processes

Posted in Books, Statistics with tags , , , , , , , , on September 21, 2016 by xi'an

Rémi Bardenet and Adrien Hardy have arXived this paper a few months ago but I was a bit daunted by the sheer size of the paper, until I found the perfect opportunity last week..! The approach relates to probabilistic numerics as well as Monte Carlo, in that it can be seen as a stochastic version of Gaussian quadrature. The authors mention in the early pages a striking and recent result by Delyon and Portier that using an importance weight where the sampling density is replaced with the leave-one-out kernel estimate produces faster convergence than the regular Monte Carlo √n! Which reminds me of quasi-Monte Carlo of course, discussed in the following section (§1.3), with the interesting [and new to me] comment that the theoretical rate (and thus the improvement) does not occur until the sample size N is exponential in the dimension. Bardenet and Hardy achieve similar super-efficient convergence by mixing quadrature with repulsive simulation. For almost every integrable function.

The fact that determinantal point processes (on the unit hypercube) and Gaussian quadrature methods are connected is not that surprising once one considers that such processes are associated with densities made of determinants, which matrices are kernel-based, K(x,y), with K expressed as a sum of orthogonal polynomials. An N-th order determinantal process in dimension d satisfies a generalised Central Limit Theorem in that the speed of convergence is

$\sqrt{N}^{(d-1)/d}$

which means faster than √N…  This is more surprising, of course, even though one has to examine the conditions Continue reading

## MCqMC [#3]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on August 20, 2016 by xi'an

On Thursday, Christoph Aistleiter [from TU Gräz] gave a plenary talk at MCqMC 2016 around Hermann Weyl’s 1916 paper, Über die Gleichverteilung von Zahlen mod. Eins, which demonstrates that the sequence a, 22a, 32a, … mod 1 is uniformly distributed on the unit interval when a is irrational. Obviously, the notion was not introduced for simulation purposes, but the construction applies in this setting! At least in a theoretical sense. Since for instance the result that the sequence (a,a²,a³,…) mod 1 being uniformly distributed for almost all a’s has not yet found one realisation a. But a nice hour of history of mathematics and number theory: it is not that common we hear the Riemann zeta function mentioned in a simulation conference!

The following session was a nightmare in that I wanted to attend all four at once! I eventually chose the transport session, in particular because Xiao-Li advertised it at the end of my talk. The connection is that his warp bridge sampling technique provides a folding map between modes of a target. Using a mixture representation of the target and folding all components to a single distribution. Interestingly, this transformation does not require a partition and preserves the normalising constants [which has a side appeal for bridge sampling of course]. In a problem with an unknown number of modes, the technique could be completed by [our] folding in order to bring the unobserved modes into the support of the folded target. Looking forward the incoming paper! The last talk of this session was by Matti Vihola, connecting multi-level Monte Carlo and unbiased estimation à la Rhee and Glynn, paper that I missed when it got arXived last December.

The last session of the day was about probabilistic numerics. I have already discussed extensively about this approach to numerical integration, to the point of being invited to the NIPS workshop as a skeptic! But this was an interesting session, both with introductory aspects and with new ones from my viewpoint, especially Chris Oates’ description of a PN method for handling both integrand and integrating measure as being uncertain. Another arXival that went under my decidedly deficient radar.