importance sampling and necessary sample size

Daniel Sanz-Alonso arXived a note yesterday where he analyses importance sampling from the point of view of empirical distributions. With the difficulty that unnormalised importance sampling estimators are not associated with an empirical distribution since the sum of the weights is not one. For several f-divergences, he obtains upper bounds on those divergences between the empirical cdf and a uniform version, D(w,u), which translate into lower bounds on the importance sample size. I however do not see why this divergence between a weighted sampled and the uniformly weighted version is relevant for the divergence between the target and the proposal, nor how the resulting Monte Carlo estimator is impacted by this bound. A side remark [in the paper] is that those results apply to infinite variance Monte Carlo estimators, as in the recent paper of Chatterjee and Diaconis I discussed earlier, which also discussed the necessary sample size.

2 Responses to “importance sampling and necessary sample size”

  1. Daniel Sanz-Alonso Says:

    Thanks for your post, just a couple of things to clarify:
    1) the results apply for ANY f-divergence (theorem 1), and then I give several examples (theorem 2).
    2) The divergence between the weights and the uniform weights is relevant for the divergence between target and proposal because the former is a Monte Carlo approximation of the latter (see the first line of the proof of Theorem 1). What I show is that if with positive probability you can estimate the constant one and the $f$-divergence, then this automatically gives a necessary requirement on the sample size. For instance, for the Kullback Leibler the theorem gives that the sample size needs to be larger than the exponential of KL, but indeed if the change of measure is square integrable I can show further that the sample size needs to be larger than the chi-square, which is sharper. The note has a discussion on the classes of test functions for which the results hold, which are different depending on the f-divergence used in the analysis.

    • Thank you for taking the time to reply to my quick review. My main reservation is that this type of approach does not exclude infinite variance estimators, which should not be used for approximation purposes as being too slow to converge.

Leave a reply to Daniel Sanz-Alonso Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.