borderline infinite variance in importance sampling

borde1As I was still musing about the posts of last week around infinite variance importance sampling and its potential corrections, I wondered at whether or not there was a fundamental difference between “just” having a [finite] variance and “just” having none. In conjunction with Aki’s post. To get a better feeling, I ran a quick experiment with Exp(1) as the target and Exp(a) as the importance distribution. When estimating E[X]=1, the above graph opposes a=1.95 to a=2.05 (variance versus no variance, bright yellow versus wheat), a=2.95 to a=3.05 (third moment versus none, bright yellow versus wheat), and a=3.95 to a=4.05 (fourth moment versus none, bright yellow versus wheat). The graph below is the same for the estimation of E[exp(X/2)]=2, which has an integrand that is not square integrable under the target. Hence seems to require higher moments for the importance weight. Hard to derive universal theories from those two graphs, however they show that protection against sudden drifts in the estimation sequence. As an aside [not really!], apart from our rather confidential Confidence bands for Brownian motion and applications to Monte Carlo simulation with Wilfrid Kendall and Jean-Michel Marin, I do not know of many studies that consider the sequence of averages time-wise rather than across realisations at a given time and still think this is a more relevant perspective for simulation purposes.

borde2

3 Responses to “borderline infinite variance in importance sampling”

  1. One thing that might be interesting to try is to sample from a uniform distribution and use inverse transform sampling to get samples from the importance distribution. You could create plots like those in the post that would continuously transform from one to the other as you smoothly vary the parameter of the importance distribution. You could take the parameter smoothly across the boundary of infinite variance and see exactly what happens to those large jumps.

    • Thanks, Corey: I actually used the same Exp(1) sample for all graphs, rescaling by the proper factor each time! Hence the pictures implement your proposal. Great foresight, isn’t it?!

  2. Is there any way to formalise this intuition using an operator interpolation-type argument?

    The standard way that this works is that you have 2 Banach spaces (X [say functions with 2 moments] and Y [say L^1]) that are nicely contained as subspaces of a big underlying space Z. and you have a linear operator T: X->R and T: Y->R.

    Now interpolation allows you to define a scale of “in between” spaces (X,Y)_t (this is another Banach space nicely contained in Z) such that
    ||T||_{(X,Y)_t} \leq C ||T||_X^t ||T||_Y^{1-t},
    where those norms are operator norms and 0<t<1.

    So basically if T is some sort of error measure such that ||T||_Y <C and ||T||_X < n^{-k}, then
    || T ||_{(X,Y)_t} <= C n^{-tk}.

    Now, I'm not likely to find the time to work it all the way through, but it would be quite surprising to me if there wasn't a way to do this so that (X,Y)_t was the space of all functions with 2t (<2) moments.

    (This is especially true given that you can think of functions with k moments as weighted L^1 spaces, so Z = L^1 is a natural enveloping space)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: