Archive for multiple importance methods

a new rule for adaptive importance sampling

Posted in Books, Statistics with tags , , , , , , , , , on March 5, 2019 by xi'an

Art Owen and Yi Zhou have arXived a short paper on the combination of importance sampling estimators. Which connects somehow with the talk about multiple estimators I gave at ESM last year in Helsinki. And our earlier AMIS combination. The paper however makes two important assumptions to reach optimal weighting, which is inversely proportional to the variance:

  1. the estimators are uncorrelated if dependent;
  2. the variance of the k-th estimator is of order a (negative) power of k.

The later is puzzling when considering a series of estimators, in that k appears to act as a sample size (as in AMIS), the power is usually unknown but also there is no reason for the power to be the same for all estimators. The authors propose to use ½ as the default, both because this is the standard Monte Carlo rate and because the loss in variance is then minimal, being 12% larger.

As an aside, Art Owen also wrote an invited discussion “the unreasonable effectiveness of Monte Carlo” of ” Probabilistic Integration: A Role in Statistical Computation?” by François-Xavier Briol, Chris  Oates, Mark Girolami (Warwick), Michael Osborne and Deni Sejdinovic, to appear in Statistical Science, discussion that contains a wealth of smart and enlightening remarks. Like the analogy between pseudo-random number generators [which work unreasonably well!] vs true random numbers and Bayesian numerical integration versus non-random functions. Or the role of advanced bootstrapping when assessing the variability of Monte Carlo estimates (citing a paper of his from 1992). Also pointing out at an intriguing MCMC paper by  Michael Lavine and Jim Hodges to appear in The American Statistician.

optimal approximations for importance sampling

Posted in Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , , on August 17, 2018 by xi'an

“…building such a zero variance estimator is most of the times not practical…”

As I was checking [while looking at Tofino inlet from my rental window] on optimal importance functions following a question on X validated, I came across this arXived note by Pantaleoni and Heitz, where they suggest using weighted sums of step functions to reach minimum variance. However, the difficulty with probability densities that are step functions is that they necessarily have a compact support, which thus make them unsuitable for targeted integrands with non-compact support. And making the purpose of the note and the derivation of the optimal weights moot. It points out its connection with the reference paper of Veach and Guibas (1995) as well as He and Owen (2014), a follow-up to the other reference paper by Owen and Zhou (2000).

multiple importance sampling

Posted in Books, Statistics, University life with tags , , , , , , , , on November 20, 2015 by xi'an

“Within this unified context, it is possible to interpret that all the MIS algorithms draw samples from a equal-weighted mixture distribution obtained from the set of available proposal pdfs.”

In a very special (important?!) week for importance sampling!, Elvira et al. arXived a paper about generalized multiple importance sampling. The setting is the same as in earlier papers by Veach and Gibas (1995) or Owen and Zhou (2000) [and in our AMIS paper], namely a collection of importance functions and of simulations from those functions. However, there is no adaptivity for the construction of the importance functions and no Markov (MCMC) dependence on the generation of the simulations.

multipl
“One of the goals of this paper is to provide the practitioner with solid theoretical results about the superiority of some specific MIS schemes.”

One first part deals with the fact that a random point taken from the conjunction of those samples is distributed from the equiweighted mixture. Which was a fact I had much appreciated when reading Owen and Zhou (2000). From there, the authors discuss the various choices of importance weighting. Meaning the different degrees of Rao-Blackwellisation that can be applied to the sample. As we discovered in our population Monte Carlo research [which is well-referred within this paper], conditioning too much leads to useless adaptivity. Again a sort of epiphany for me, in that a whole family of importance functions could be used for the same target expectation and the very same simulated value: it all depends on the degree of conditioning employed for the construction of the importance function. To get around the annoying fact that self-normalised estimators are never unbiased, the authors borrow Liu’s (2000) notion of proper importance sampling estimators, where the ratio of the expectations is returning the right quantity. (Which amounts to recover the correct normalising constant(s), I believe.) They then introduce five (5!) different possible importance weights that all produce proper estimators. However, those weights correspond to different sampling schemes, so do not apply to the same sample. In other words, they are not recycling weights as in AMIS. And do not cover the adaptive cases where the weights and parameters of the different proposals change along iterations. Unsurprisingly, the smallest variance estimator is the one based on sampling without replacement and an importance weight made of the entire mixture. But this result does not apply for the self-normalised version, whose variance remains intractable.

I find this survey of existing and non-existing multiple importance methods quite relevant and a must-read for my students (and beyond!). My reservations (for reservations there must be!) are that the study stops short of pushing further the optimisation. Indeed, the available importance functions are not equivalent in terms of the target and hence weighting them equally is sub-efficient. The adaptive part of the paper broaches upon this issue but does not conclude.