## multiple importance sampling

Posted in Books, Statistics, University life with tags , , , , , , , , on November 20, 2015 by xi'an

“Within this unified context, it is possible to interpret that all the MIS algorithms draw samples from a equal-weighted mixture distribution obtained from the set of available proposal pdfs.”

In a very special (important?!) week for importance sampling!, Elvira et al. arXived a paper about generalized multiple importance sampling. The setting is the same as in earlier papers by Veach and Gibas (1995) or Owen and Zhou (2000) [and in our AMIS paper], namely a collection of importance functions and of simulations from those functions. However, there is no adaptivity for the construction of the importance functions and no Markov (MCMC) dependence on the generation of the simulations.

“One of the goals of this paper is to provide the practitioner with solid theoretical results about the superiority of some specific MIS schemes.”

One first part deals with the fact that a random point taken from the conjunction of those samples is distributed from the equiweighted mixture. Which was a fact I had much appreciated when reading Owen and Zhou (2000). From there, the authors discuss the various choices of importance weighting. Meaning the different degrees of Rao-Blackwellisation that can be applied to the sample. As we discovered in our population Monte Carlo research [which is well-referred within this paper], conditioning too much leads to useless adaptivity. Again a sort of epiphany for me, in that a whole family of importance functions could be used for the same target expectation and the very same simulated value: it all depends on the degree of conditioning employed for the construction of the importance function. To get around the annoying fact that self-normalised estimators are never unbiased, the authors borrow Liu’s (2000) notion of proper importance sampling estimators, where the ratio of the expectations is returning the right quantity. (Which amounts to recover the correct normalising constant(s), I believe.) They then introduce five (5!) different possible importance weights that all produce proper estimators. However, those weights correspond to different sampling schemes, so do not apply to the same sample. In other words, they are not recycling weights as in AMIS. And do not cover the adaptive cases where the weights and parameters of the different proposals change along iterations. Unsurprisingly, the smallest variance estimator is the one based on sampling without replacement and an importance weight made of the entire mixture. But this result does not apply for the self-normalised version, whose variance remains intractable.

I find this survey of existing and non-existing multiple importance methods quite relevant and a must-read for my students (and beyond!). My reservations (for reservations there must be!) are that the study stops short of pushing further the optimisation. Indeed, the available importance functions are not equivalent in terms of the target and hence weighting them equally is sub-efficient. The adaptive part of the paper broaches upon this issue but does not conclude.

## optimal mixture weights in multiple importance sampling

Posted in Statistics, University life with tags , , , , , , on December 12, 2014 by xi'an

Multiple importance sampling is back!!! I am always interested in this improvement upon regular importance sampling, even or especially after publishing a recent paper about our AMIS (for adaptive multiple importance sampling) algorithm, so I was quite eager to see what was in Hera He’s and Art Owen’s newly arXived paper. The paper is definitely exciting and set me on a new set of importance sampling improvements and experiments…

Some of the most interesting developments in the paper are that, (i) when using a collection of importance functions qi with the same target p, every ratio qi/p is a control variate function with expectation 1 [assuming each of the qi‘s has a support smaller than the support of p]; (ii) the weights of a mixture of the qi‘s can be chosen in an optimal way towards minimising the variance for a certain integrand; (iii) multiple importance sampling incorporates quite naturally stratified sampling, i.e. the qi‘s may have disjoint supports; )iv) control variates contribute little, esp. when compared with the optimisation over the weights [which does not surprise me that much, given that the control variates have little correlation with the integrands]; (v) Veach’s (1997) seminal PhD thesis remains a driving force behind those results [and in getting Eric Veach an Academy Oscar in 2014!].

One extension that I would find of the uttermost interest deals with unscaled densities, both for p and the qi‘s. In that case, the weights do not even sum up to a know value and I wonder at how much more difficult it is to analyse this realistic case. And unscaled densities led me to imagine using geometric mixtures instead. Or even harmonic mixtures! (Maybe not.)

Another one is more in tune with our adaptive multiple mixture paper. The paper works with regret, but one could also work with remorse! Besides the pun, this means that one could adapt the weights along iterations and even possible design new importance functions from the past outcome, i.e., be adaptive once again. He and Owen suggest mixing their approach with our adaptive sequential Monte Carlo model.

## workshop in Columbia [day 2]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , on September 26, 2011 by xi'an

The second day at the workshop was closer to my research topics and thus easier to follow, if equally enjoyable compared with yesterday: Jun Liu’s talk went over his modification of the Clifford-Fearnhead particle algorithm in great details, Sam Kou explained how a simulated annealing algorithm could make considerable improvement in the prediction of the 3D structure of molecules, Jeff Rosenthal showed us the recent results on and applications of adaptive MCMC, Gareth Roberts detailed his new results on the exact simulation of diffusions, and Xiao-Li Meng went back to his 2002 Read Paper to explain how we should use likelihood principles in Monte Carlo as well. And convince me I was “too young” to get the whole idea! (As I was a discussant of this paper.) All talks were thought-provoking and I enjoyed very much Gareth’s approach and description of the algorithm (as did the rest of the audience, to the point of asking too many questions during the talk!). However, the most revealing talk was Xiao-Li’s in that he did succeed in convincing me of the pertinence of his “unknown measure” approach thanks to a multiple mixture example where the actual mixture importance sampler

$\dfrac{1}{n}\sum_{i=1}^n \dfrac{q(x_i)}{\sum \pi_j p_j(x_i)}$

gets dominated by the estimated mixture version

$\dfrac{1}{n}\sum_{i=1}^n \dfrac{q(x_i)}{\sum \hat\pi_j p_j(x_i)}$

Even though I still remain skeptical by the group averaging perspective, for the same reason as earlier that the group is not acting in conjunction with the target function. Hence averaging over transforms of no relevance for the target. Nonetheless, the idea of estimating the best “importance function” based on the simulated values rather than using the genuine importance function is quite a revelation, linking with an earlier question of mine (and others) on the (lack of) exploitation of the known values of the target at the simulated points. (Maybe up to a constant.) Food for thought, certainly… In memory of this discussion, here is a picture [of an ostrich] my daughter drew at the time for my final slide in London: