“In this article it is shown that in a fairly general setting, a sample of size approximately exp(D(μ|ν)) is necessary and sufficient for accurate estimation by importance sampling.”

**S**ourav Chatterjee and Persi Diaconis arXived yesterday an exciting paper where they study the proper sample size in an importance sampling setting with no variance. That’s right, *with no variance*. They give as a starting toy example the use of an Exp(1) proposal for an Exp(1/2) target, where the importance ratio exp(x/2)/2 has no ξ order moment (for ξ≥2). So the infinity in the variance is somehow borderline in this example, which may explain why the estimator could be considered to “work”. However, I disagree with the statement “that a sample size a few thousand suffices” for the estimator of the mean to be close to the true value, that is, 2. For instance, the picture I drew above is the superposition of 250 sequences of importance sampling estimators across 10⁵ iterations: several sequences show huge jumps, even for a large number of iterations, which are characteristic of infinite variance estimates. Thus, while the expected distance to the true value can be closely evaluated via the Kullback-Leibler divergence between the target and the proposal (which by the way is infinite when using a Normal as proposal and a Cauchy as target), there are realisations of the simulation path that can remain far from the true value and this for an arbitrary number of simulations. (I even wonder if, for a given simulation path, waiting long enough should ~~not~~ lead to those unbounded jumps.) The first result is frequentist, while the second is conditional, i.e., can occur for the single path we have just simulated… As I taught in class this very morning, I thus remain wary about using an infinite variance estimator. (And not only in connection with the harmonic mean quagmire. As shown below by the more extreme case of simulating an Exp(1) proposal for an Exp(1/10) target, where the mean is completely outside the range of estimates.) Wary, then, even though I find the enclosed result about the existence of a cut-off sample size associated with this L¹ measure quite astounding. Continue reading