Monte Carlo with determinantal processes [reply from the authors]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on September 22, 2016 by xi'an

[Rémi Bardenet and Adrien Hardy have written a reply to my comments of today on their paper, which is more readable as a post than as comments, so here it is. I appreciate the intention, as well as the perfect editing of the reply, suited for a direct posting!]

Thanks for your comments, Xian. As a foreword, a few people we met also had the intuition that DPPs would be relevant for Monte Carlo, but no result so far was backing this claim. As it turns out, we had to work hard to prove a CLT for importance-reweighted DPPs, using some deep recent results on orthogonal polynomials. We are currently working on turning this probabilistic result into practical algorithms. For instance, efficient sampling of DPPs is indeed an important open question, to which most of your comments refer. Although this question is out of the scope of our paper, note however that our results do not depend on how you sample. Efficient sampling of DPPs, along with other natural computational questions, is actually the crux of an ANR grant we just got, so hopefully in a few years we can write a more detailed answer on this blog! We now answer some of your other points.

“one has to examine the conditions for the result to operate, from the support being within the unit hypercube,”
Any compactly supported measure would do, using dilations, for instance. Note that we don’t assume the support is the whole hypercube.

“to the existence of N orthogonal polynomials wrt the dominating measure, not discussed here”
As explained in Section 2.1.2, it is enough that the reference measure charges some open set of the hypercube, which is for instance the case if it has a density with respect to the Lebesgue measure.

“to the lack of relation between the point process and the integrand,”
Actually, our method depends heavily on the target measure μ. Unlike vanilla QMC, the repulsiveness between the quadrature nodes is tailored to the integration problem.

“changing N requires a new simulation of the entire vector unless I missed the point.”
You’re absolutely right. This is a well-known open issue in probability, see the discussion on Terence Tao’s blog.

“This requires figuring out the upper bounds on the acceptance ratios, a “problem-dependent” request that may prove impossible to implement”
We agree that in general this isn’t trivial. However, good bounds are available for all Jacobi polynomials, see Section 3.

“Even without this stumbling block, generating the N-sized sample for dimension d=N (why d=N, I wonder?)”
This is a misunderstanding: we do not say that d=N in any sense. We only say that sampling from a DPP using the algorithm of [Hough et al] requires the same number of operations as orthonormalizing N vectors of dimension N, hence the cubic cost.

1. “how does it relate to quasi-Monte Carlo?”
So far, the connection to QMC is only intuitive: both rely on well-spaced nodes, but using different mathematical tools.

2. “the marginals of the N-th order determinantal process are far from uniform (see Fig. 1), and seemingly concentrated on the boundaries”
This phenomenon is due to orthogonal polynomials. We are investigating more general constructions that give more flexibility.

3. “Is the variance of the resulting estimator (2.11) always finite?”
Yes. For instance, this follows from the inequality below (5.56) since ƒ(x)/K(x,x) is Lipschitz.

4. and 5. We are investigating concentration inequalities to answer these points.

6. “probabilistic numerics produce an epistemic assessment of uncertainty, contrary to the current proposal.”
A partial answer may be our Remark 2.12. You can interpret DPPs as putting a Gaussian process prior over ƒ and sequentially sampling from the posterior variance of the GP.

nested sampling with a test

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , on December 5, 2014 by xi'an

On my way back from Warwick, I read through a couple preprints, including this statistical test for nested sampling algorithms by Johannes Buchner. As it happens, I had already read and commented it in July! However, without the slightest memory of it (sad, isn’t it?!), I focussed this time much more on the modification proposed to MultiNest than on the test itself, which is in fact a Kolmogorov-Smirnov test applied to a specific target function.

Indeed, when reading the proposed modification of Buchner, I thought of a modification to the modification that sounded more appealing. Without getting back  to defining nested sampling in detail, this algorithm follows a swarm of N particles within upper-level sets of the likelihood surface, each step requiring a new simulation above the current value of the likelihood. The remark that set me on this time was that we should exploit the fact that (N-1) particles were already available within this level set. And uniformly distributed herein. Therefore this particle cloud should be exploited as much as possible to return yet another particle distributed just as uniformly as the other ones (!). Buchner proposes an alternative to MultiNest based on a randomised version of the maximal distance to a neighbour and a ball centre picked at random (but not uniformly). But it would be just as feasible to draw a distance from the empirical cdf of the distances to the nearest neighbours or to the k-nearest neighbours. With some possible calibration of k. And somewhat more accurate, because this distribution represents the repartition of the particle within the upper-level set. Although I looked at it briefly in the [sluggish] metro from Roissy airport, I could not figure out a way to account for the additional point to be included in the (N-1) existing particles. That is, how to deform the empirical cdf of those distances to account for an additional point. Unless one included the just-removed particle, which is at the boundary of this upper-level set. (Or rather, which defines the boundary of this upper-level set.) I have no clear intuition as to whether or not this would amount to a uniform generation over the true upper-level set. But simulating from the distance distribution would remove (I think) the clustering effect mentioned by Buchner.

“Other priors can be mapped [into the uniform prior over the unit hypercube] using the inverse of the cumulative prior distribution.”

Hence another illustration of the addictive features of nested sampling! Each time I get back to this notion, a new understanding or reinterpretation comes to mind. In any case, an equally endless source of projects for Master students. (Not that I agree with the above quote, mind you!)