Archive for adaptive importance sampling

optimal mixture weights in multiple importance sampling

Posted in Statistics, University life with tags , , , , , , on December 12, 2014 by xi'an

Multiple importance sampling is back!!! I am always interested in this improvement upon regular importance sampling, even or especially after publishing a recent paper about our AMIS (for adaptive multiple importance sampling) algorithm, so I was quite eager to see what was in Hera He’s and Art Owen’s newly arXived paper. The paper is definitely exciting and set me on a new set of importance sampling improvements and experiments…

Some of the most interesting developments in the paper are that, (i) when using a collection of importance functions qi with the same target p, every ratio qi/p is a control variate function with expectation 1 [assuming each of the qi‘s has a support smaller than the support of p]; (ii) the weights of a mixture of the qi‘s can be chosen in an optimal way towards minimising the variance for a certain integrand; (iii) multiple importance sampling incorporates quite naturally stratified sampling, i.e. the qi‘s may have disjoint supports; )iv) control variates contribute little, esp. when compared with the optimisation over the weights [which does not surprise me that much, given that the control variates have little correlation with the integrands]; (v) Veach’s (1997) seminal PhD thesis remains a driving force behind those results [and in getting Eric Veach an Academy Oscar in 2014!].

One extension that I would find of the uttermost interest deals with unscaled densities, both for p and the qi‘s. In that case, the weights do not even sum up to a know value and I wonder at how much more difficult it is to analyse this realistic case. And unscaled densities led me to imagine using geometric mixtures instead. Or even harmonic mixtures! (Maybe not.)

Another one is more in tune with our adaptive multiple mixture paper. The paper works with regret, but one could also work with remorse! Besides the pun, this means that one could adapt the weights along iterations and even possible design new importance functions from the past outcome, i.e., be adaptive once again. He and Owen suggest mixing their approach with our adaptive sequential Monte Carlo model.

Another history of MCMC

Posted in Books, Statistics, University life with tags , , , , , on April 20, 2011 by xi'an

In the most recent issue of Statistical Science, the special topic is “Celebrating the EM Algorithm’s Quandunciacentennial“. It contains an historical survey by Martin Tanner and Wing Wong on the emergence of MCMC Bayesian computation in the 1980’s, This survey is more focused and more informative than our global history (also to appear in Statistical Science). In particular, it provides the authors’ analysis as to why MCMC was delayed by ten years or so (or even more when considering that a Gibbs sampler as a simulation tool appears in both Hastings’ (1970) and Besag‘s (1974) papers). They dismiss [our] concerns about computing power (I was running Monte Carlo simulations on my Apple IIe by 1986 and a single mean square error curve evaluation for a James-Stein type estimator would then take close to a weekend!) and Markov innumeracy, rather attributing the reluctance to a lack of confidence into the method. This perspective remains debatable as, apart from Tony O’Hagan who was then fighting again Monte Carlo methods as being un-Bayesian (1987, JRSS D),  I do not remember any negative attitude at the time about simulation and the immediate spread of the MCMC methods from Alan Gelfand’s and Adrian Smith’s presentations of their 1990 paper shows on the opposite that the Bayesian community was ready for the move.

Another interesting point made in this historical survey is that Metropolis’ and other Markov chain methods were first presented outside simulation sections of books like Hammersley and Handscomb (1964), Rubinstein (1981) and Ripley (1987), perpetuating the impression that such methods were mostly optimisation or niche specific methods. This is also why Besag’s earlier works (not mentioned in this survey) did not get wider recognition, until later. Something I was not aware is the appearance of iterative adaptive importance sampling (i.e. population Monte Carlo) in the Bayesian literature of the 1980’s, with proposals from Herman van Dijk, Adrian Smith, and others. The appendix about Smith et al. (1985), the 1987 special issue of JRSS D, and the computation contents of Valencia 3 (that I sadly missed for being in the Army!) is also quite informative about the perception of computational Bayesian statistics at this time.

A missing connection in this survey is Gilles Celeux and Jean Diebolt’s stochastic EM (or SEM). As early as 1981, with Michel Broniatowski, they proposed a simulated version of EM  for mixtures where the latent variable z was simulated from its conditional distribution rather than replaced with its expectation. So this was the first half of the Gibbs sampler for mixtures we completed with Jean Diebolt about ten years later. (Also found in Gelman and King, 1990.) These authors did not get much recognition from the community, though, as they focused almost exclusively on mixtures, used simulation to produce a randomness that would escape the local mode attraction, rather than targeting the posterior distribution, and did not analyse the Markovian nature of their algorithm until later with the simulated annealing EM algorithm.

AMIS revised & resubmitted

Posted in R, Statistics, University life with tags , , , on December 19, 2010 by xi'an

After a thorough revision that removed most of the theoretical attempts at improving our understanding of AMIS convergence, we have now resubmitted the AMIS paper to Scandinavian Journal of Statistics and arXived the new version as well. (I remind the reader that AMIS stands for adaptive mixture importance sampling and that it implements an adaptive version of Owen and Zhou’s (2000, JASA) stabilisation mixture technique, using this correction on the past and present importance weights, at each iteration of this iterative algorithm.) The AMIS method starts being used in population genetics, including an on-going work by Jean-Marie Cornuet and a published paper in Molecular Biology and Evolution by Sirén, Marttinen and Corander. The challenge of properly demonstrating AMIS convergence remains open!

Recent arXivals

Posted in Statistics, University life with tags , , , , , on October 11, 2010 by xi'an

Interesting entries I wish I had more time to peruse!

but I could go no further than the abstracts…


Posted in R, Statistics, University life with tags , , , , , , , , on July 30, 2010 by xi'an

A most interesting paper by Adrian Raftery and Le Bao appeared in the Early View section of Biometrics.  It aims at better predictions for HIV prevalence—in the original UNAIDS implementation, a naïve SIR procedure was used, based on the prior as importance function, which sometimes resulted in terrible degeneracy—, but its methodological input is about incremental mixture importance sampling (IMIS), thus relates to the general topic of adaptive Monte Carlo methods I am interested in. (And to some extent to our recent AMIS paper.) Actually, a less elaborate (and less related) version of the IMIS algorithm first appeared in a 2006 paper by Steele, Raftery and Edmond in JCGS in the setting of finite mixture likelihoods and I somehow managed to miss it…

Raftery and Bao propose to replace SIR with an iterative importance sampling technique developed in 2003 by Steele et al. that has some similarities with population Monte Carlo (PMC). (A negligible misrepresentation of PMC in the current paper is that our method does not use “the prior as importance function'”.) In its current format, the IMIS algorithm starts from a first guess (e.g., the prior distribution) and builds a sequence of Gaussian (or Gaussian mixture) approximations whose parameters are estimated from the current population, while all simulation are merged together at each step, using a mixture stabilising weight

\pi(\theta_i^s|x) / \omega_0 p_0(\theta_i^0)+\sum_r \omega_r \hat q_r(\theta_i^s)

where the weights \omega_r depend on the number of simulations at step r. This pattern also appears in our adaptive multiple importance sampling (AMIS) algorithm developed in this arXiv paper with Jean-Marie Cornuet, Jean-Michel Marin and Antonietta Mira, and in the original paper by Owen and Zhou (2000, JASA) that inspired us. Raftery and Bo extend the methodology to an IMIS with optimisation at the initial stage, while AMIS incorporates the natural population Monte Carlo stepwise optimisation developed in Douc et al. (2008, Annals of Statistics) that brings the proposal kernel closer to the target after each iteration. The application of the simulations to conduct model choice found in the current paper and in Steele et al. can also be paralleled with the one using population Monte Carlo we conducted for cosmological data in MNRAS.

Interestingly, Raftery and Bo (and also Steele et al.) refer to the defensive mixture paper of Hesterberg (1995, Technometrics), which has been very influential in my research on importance sampling, and (less directly) to Owen and Zhou (2000, JASA), who did propose the deterministic mixture scheme that inspired AMIS. Besides the foundational papers of Oh and Berger (1991, JASA) and West (1993, J. Royal Statistical Society Series B), they also mention a paper by Raghavan and Cox (1998, J. Statistical Simulation & Computation) I was not aware of, which introduces as well a mixture of importance proposals as a variance stabilising technique.


Get every new post delivered to your Inbox.

Join 721 other followers