## Monte Carlo with determinantal processes

**R**émi Bardenet and Adrien Hardy have arXived this paper a few months ago but I was a bit daunted by the sheer size of the paper, until I found the perfect opportunity last week..! The approach relates to probabilistic numerics as well as Monte Carlo, in that it can be seen as a stochastic version of Gaussian quadrature. The authors mention in the early pages a striking and recent result by Delyon and Portier that using an importance weight where the sampling density is replaced with the leave-one-out kernel estimate produces faster convergence than the regular Monte Carlo √n! Which reminds me of quasi-Monte Carlo of course, discussed in the following section (§1.3), with the interesting [and new to me] comment that the theoretical rate (and thus the improvement) does not occur until the sample size N is exponential in the dimension. Bardenet and Hardy achieve similar super-efficient convergence by mixing quadrature with repulsive simulation. For almost every integrable function.

The fact that determinantal point processes (on the unit hypercube) and Gaussian quadrature methods are connected is not that surprising once one considers that such processes are associated with densities made of determinants, which matrices are kernel-based, K(x,y), with K expressed as a sum of orthogonal polynomials. An N-th order determinantal process in dimension d satisfies a generalised Central Limit Theorem in that the speed of convergence is

which means faster than √N… This is more surprising, of course, even though one has to examine the conditions for the result to operate, from the support being within the unit hypercube, to the existence of N orthogonal polynomials wrt the dominating measure, not discussed here, to the lack of relation between the point process and the integrand, to the generation of those N points—generation that seems to depend on N, meaning that changing N requires a new simulation of the entire vector unless I missed the point. Surprising as well is the fact that the limiting variance in the CLT does not depend on the proposal density/determinantal point process (Remark 2.10). The major point however is how to generate an N-sized sample, which is handled in the paper by N accept-reject steps using Be(½,½) proposals. This requires figuring out the upper bounds on the acceptance ratios, a “problem-dependent” request that may prove impossible to implement. Even without this stumbling block, generating the N-sized sample for dimension d=N (why d=N, I wonder?) is currently an endeavour of order O(N³).

Practical if possibly naïve questions that arise about this approach are

- Given that the
*single*simulation produces a negatively correlated pattern over the unit hypercube, how does it relate to quasi-Monte Carlo? - Since the marginals of the N-th order determinantal process are far from uniform (see Fig. 1), and seemingly concentrated
*on the boundaries*, how can this help in settings like Bayesian statistics when the posterior is quite concentrated*inside*the unit hypercube? - Is the variance of the resulting estimator (2.11) always
*finite*? - How is the precision of the result impacted by the choice of the dominating measure and of the associated orthogonal polynomials?
- What are the ways of evaluating the precision of the estimator (2.11) on the go and to decide when to stop and when to
*renew the entire simulation*for a larger N (an issue shared by quasi-Monte Carlo)? - While the link with probabilistic numerics is mentioned in the last section, the comparison is unclear. In particular, probabilistic numerics produce an epistemic
*assessment of uncertainty*, contrary to the current proposal. - The dimension d impacts [downwards] the rate of the Central Limit Theorem. But are there side-effects of the dimension that are not discussed in the paper?
- There are now many threads with theoretical super-efficient speeds of convergence, from quasi-Monte Carlo to numerical integration, to zero variance Monte Carlo, to the current proposal, and I find it hard to fathom whether or not the mathematical property translates into a consistent improvement. There also exist zero variance results for regular importance sampling, which are of little interest and consequence for the practice of Monte Carlo evaluations. It remains to be demonstrated this setting is different.

September 21, 2016 at 2:30 am

Isn’t this “probabilistic numerics” in the tradition of randomised linear algebra, sketching etc? As the frequentist/tail bound sibling to the Bayesian version, is neither required nor expected to account for epistemic uncertainty. It just has to work. (which is nice – I’ve yet to be convinced that a Bayesian PN method actually quantifies epistemic uncertainty, but I’ve been convinced several non-Bayesian methods work)

I’m not surprised to see that DPPs work here, given that they also popped up in “non-local” priors for mixtures (in both cases you want a probability structure that supports more structured spatial randomness).

The dimension dependence isn’t all that surprising given that QMC doesn’t work in high dimensions (the less “daily mail” version: QMC works great in high dimension under specific smoothness constraints that mean it’s effectively working in low dimensions). I wouldn’t be surprised if there was a DPP kernel that fixed the d-dependence.