The main motivation for the article is that using a pre-fixed sequence of weights puts you in the realm of weighting unbiased and uncorrelated estimators by a martingale argument. At each step the estimate is unbiased given all the previous ones.

Most of the effort is about showing that you don’t need anything like the unknown true optimal weights to come out ok. If the variance at step j decays like j^-y for any y between 0 and 1, you’re ok using weights j^0.5. That is a very wide range of behaviors to cover with one rule. You never raise variance by more than 9/8 over that whole range. At j=0 the adaptation is perfectly futile. At j=1, the adaptation provides nearly quasi-Monte Carlo accuracy from plain MC estimates. For any of those you get the optimal rate and nearly optimal constant.

More realistic assumptions would involve variances that decay monotonically and are bounded between j^-L and j^-H for low and high rates, L and H. Or they could be convex combinations of j^-y for various y. We have not worked out those examples; they would involve fussy convex optimizations and give no clean insights. Also in practice you wouldn’t know L or H.

I like 9/8 as the adaptive importance sampling counterpart to 23.4%. That famous number is useful as a guideline for the acceptance rate in random walk Metropolis but is equally based on strong assumptions that one doesn’t believe.

]]>