importance sampling with infinite variance
Sourav Chatterjee and Persi Diaconis arXived yesterday an exciting paper where they study the proper sample size in an importance sampling setting with no variance. That’s right, with no variance. They give as a starting toy example the use of an Exp(1) proposal for an Exp(1/2) target, where the importance ratio exp(x/2)/2 has no ξ order moment (for ξ≥2). So the infinity in the variance is somehow borderline in this example, which may explain why the estimator could be considered to “work”. However, I disagree with the statement “that a sample size a few thousand suffices” for the estimator of the mean to be close to the true value, that is, 2. For instance, the picture I drew above is the superposition of 250 sequences of importance sampling estimators across 10⁵ iterations: several sequences show huge jumps, even for a large number of iterations, which are characteristic of infinite variance estimates. Thus, while the expected distance to the true value can be closely evaluated via the Kullback-Leibler divergence between the target and the proposal (which by the way is infinite when using a Normal as proposal and a Cauchy as target), there are realisations of the simulation path that can remain far from the true value and this for an arbitrary number of simulations. (I even wonder if, for a given simulation path, waiting long enough should
not lead to those unbounded jumps.) The first result is frequentist, while the second is conditional, i.e., can occur for the single path we have just simulated… As I taught in class this very morning, I thus remain wary about using an infinite variance estimator. (And not only in connection with the harmonic mean quagmire. As shown below by the more extreme case of simulating an Exp(1) proposal for an Exp(1/10) target, where the mean is completely outside the range of estimates.) Wary, then, even though I find the enclosed result about the existence of a cut-off sample size associated with this L¹ measure quite astounding.
“…the traditional approach of testing for convergence using the estimated variance of the importance sampling estimate has a flaw: for any given tolerance level ε, there is high probability that the test declares convergence at or before a sample size that depends only on ε and not on the two distributions f and g. This is absurd, since convergence may take arbitrarily long, depending on the problem”
The second part of the paper starts from the remark that the empirical variance estimate is a poor choice of convergence criterion. Especially when the theoretical variance does not exist. As put by Art Owen in his importance sampling notes, “Badly skewed weights could give a badly estimated mean along with a bad variance estimate that masks the problem”. The authors suggest to consider instead a sort of inverse effective sample size derived from the importance weights and given by max ω[t] / ∑ ω[t], which should be small enough. However, replicating the above Exp(1) versus Exp(1/10) example, the (selected) picture below shows that this criterion may signal that all is fine just before storm hits!