**A**s noted in an earlier post, Bernard Delyon and François Portier have recently published a paper in Bernoulli about improving the speed of convergence of an importance sampling estimator of

∫ φ(x) dx

when replacing the true importance distribution ƒ with a leave-one-out (!) kernel estimate in the importance sampling estimator… They also consider a debiased version that converges even faster at the rate

where n is the sample size, h the bandwidth and d the dimension. There is however a caveat, namely a collection of restrictive assumptions on the components of this new estimator:

- the integrand φ has a compact support, is bounded, and satisfies some Hölder-type regularity condition;
- the importance distribution ƒ is upper and lower bounded, its r-th order derivatives are upper bounded;
- the kernel K is order r, with exponential tails, and symmetric;
- the leave-one-out correction for bias has a cost O(n²) compared with O(n) cost of the regular Monte-Carlo estimator;
- the bandwidth h in the kernel estimator has a rate in n linked with the dimension d and the regularity indices of ƒ and φ

and this bandwidth needs to be evaluated as well. In the paper the authors rely on a control variate for which the integral is known, but which “looks like φ”, a strong requirement *in appearance only* since this new function is the convolution of φ with a kernel estimate of ƒ which expectation is the original importance estimate of the integral. This sounds convoluted but this is a generic control variate nonetheless! But this is also a costly step. Because of the kernel estimation aspect, the method deteriorates with the dimension of the variate x. However, since φ(x) is a real number, I wonder if running the non-parametric density estimate directly on the sample of φ(x)’s would lead to an improved estimator…