## a new rule for adaptive importance sampling

**A**rt Owen and Yi Zhou have arXived a short paper on the combination of importance sampling estimators. Which connects somehow with the talk about multiple estimators I gave at ESM last year in Helsinki. And our earlier AMIS combination. The paper however makes two important assumptions to reach optimal weighting, which is inversely proportional to the variance:

- the estimators are uncorrelated if dependent;
- the variance of the k-th estimator is of order a (negative) power of k.

The later is puzzling when considering a series of estimators, in that k appears to act as a sample size (as in AMIS), the power is usually unknown but also there is no reason for the power to be the same for all estimators. The authors propose to use ½ as the default, both because this is the standard Monte Carlo rate and because the loss in variance is then minimal, being 12% larger.

As an aside, Art Owen also wrote an invited discussion “the unreasonable effectiveness of Monte Carlo” of ” Probabilistic Integration: A Role in Statistical Computation?” by François-Xavier Briol, Chris Oates, Mark Girolami (Warwick), Michael Osborne and Deni Sejdinovic, to appear in Statistical Science, discussion that contains a wealth of smart and enlightening remarks. Like the analogy between pseudo-random number generators [which work unreasonably well!] vs true random numbers and Bayesian numerical integration versus non-random functions. Or the role of advanced bootstrapping when assessing the variability of Monte Carlo estimates (citing a paper of his from 1992). Also pointing out at an intriguing MCMC paper by Michael Lavine and Jim Hodges to appear in The American Statistician.

March 5, 2019 at 9:30 pm

It’s very cool to be included on the ‘Og. Each time I post to arXiv I wait to see if the article gets picked up.

The main motivation for the article is that using a pre-fixed sequence of weights puts you in the realm of weighting unbiased and uncorrelated estimators by a martingale argument. At each step the estimate is unbiased given all the previous ones.

Most of the effort is about showing that you don’t need anything like the unknown true optimal weights to come out ok. If the variance at step j decays like j^-y for any y between 0 and 1, you’re ok using weights j^0.5. That is a very wide range of behaviors to cover with one rule. You never raise variance by more than 9/8 over that whole range. At j=0 the adaptation is perfectly futile. At j=1, the adaptation provides nearly quasi-Monte Carlo accuracy from plain MC estimates. For any of those you get the optimal rate and nearly optimal constant.

More realistic assumptions would involve variances that decay monotonically and are bounded between j^-L and j^-H for low and high rates, L and H. Or they could be convex combinations of j^-y for various y. We have not worked out those examples; they would involve fussy convex optimizations and give no clean insights. Also in practice you wouldn’t know L or H.

I like 9/8 as the adaptive importance sampling counterpart to 23.4%. That famous number is useful as a guideline for the acceptance rate in random walk Metropolis but is equally based on strong assumptions that one doesn’t believe.

March 6, 2019 at 8:19 am

And it is even cooler to get a detailed reply by you!!!

March 7, 2019 at 7:49 pm

I definitely appreciate the martingale argument! Never occurred to me before.