**V**ivek Roy, Aixian Tan and James Flegal arXived a new paper, Estimating standard errors for importance sampling estimators with multiple Markov chains, where they obtain a central limit theorem and hence standard error estimates when using several MCMC chains to simulate from a mixture distribution as an importance sampling function. Just before I boarded my plane from Amsterdam to Calgary, which gave me the opportunity to read it completely (along with half a dozen other papers, since it is a long flight!) I first thought it was connecting to our AMIS algorithm (on which convergence Vivek spent a few frustrating weeks when he visited me at the end of his PhD), because of the mixture structure. This is actually altogether different, in that a mixture is made of unnormalised complex enough densities, to act as an importance sampler, and that, due to this complexity, the components can only be simulated via separate MCMC algorithms. Behind this characterisation lurks the challenging problem of estimating multiple normalising constants. The paper adopts the resolution by reverse logistic regression advocated in Charlie Geyer’s famous 1994 unpublished technical report. Beside the technical difficulties in establishing a CLT in this convoluted setup, the notion of mixing importance sampling and different Markov chains is quite appealing, especially in the domain of “tall” data and of splitting the likelihood in several or even many bits, since the mixture contains most of the information provided by the true posterior and can be corrected by an importance sampling step. In this very setting, I also think more adaptive schemes could be found to determine (estimate?!) the optimal weights of the mixture components.

## Archive for AMIS

## importance sampling with multiple MCMC sequences

Posted in Mountains, pictures, Statistics, Travel, University life with tags adaptive MCMC, Ames, AMIS, Amsterdam, Charlie Geyer, importance sampling, Iowa, MCMC, Monte Carlo Statistical Methods, normalising constant, splitting data on October 2, 2015 by xi'an## optimal mixture weights in multiple importance sampling

Posted in Statistics, University life with tags adaptive importance sampling, AMIS, Art Owen, control variates, multiple mixtures, Oscars (Academy Awards), population Monte Carlo on December 12, 2014 by xi'an**M**ultiple importance sampling is back!!! I am always interested in this improvement upon regular importance sampling, even or especially after publishing a recent paper about our AMIS (for adaptive multiple importance sampling) algorithm, so I was quite eager to see what was in Hera He’s and Art Owen’s newly arXived paper. The paper is definitely exciting and set me on a new set of importance sampling improvements and experiments…

Some of the most interesting developments in the paper are that, (i) when using a collection of importance functions q_{i} with the same target p, every ratio q_{i}/p is a *control variate* function with expectation 1 [assuming each of the q_{i}‘s has a support smaller than the support of p]; (ii) the weights of a mixture of the q_{i}‘s can be chosen in an optimal way towards minimising the variance for a certain integrand; (iii) multiple importance sampling incorporates quite naturally stratified sampling, i.e. the q_{i}‘s may have disjoint supports; )iv) control variates contribute little, esp. when compared with the optimisation over the weights [which does not surprise me that much, given that the control variates have little correlation with the integrands]; (v) Veach’s (1997) seminal PhD thesis remains a driving force behind those results [and in getting Eric Veach an Academy Oscar in 2014!].

One extension that I would find of the uttermost interest deals with unscaled densities, both for p and the q_{i}‘s. In that case, the weights do not even sum up to a know value and I wonder at how much more difficult it is to analyse this realistic case. And unscaled densities led me to imagine using geometric mixtures instead. Or even harmonic mixtures! (Maybe not.)

Another one is more in tune with our adaptive multiple mixture paper. The paper works with *regret*, but one could also work with *remorse*! Besides the pun, this means that one could adapt the weights along iterations and even possible design new importance functions from the past outcome, i.e., be adaptive once again. He and Owen suggest mixing their approach with our adaptive sequential Monte Carlo model.

## PMC for combinatoric spaces

Posted in Statistics, University life with tags AMIS, CUNY, importance sampling, Monte Carlo Statistical Methods, PMC, population Monte Carlo, simulation, unbiasedness on July 28, 2014 by xi'an**I** received this interesting [edited] email from Xiannian Fan at CUNY:

I am trying to use PMC to solve Bayesian network structure learning problem (which is in a combinatorial space, not continuous space).

In PMC, the proposal distributions q

_{i,t}can be very flexible, even specific to each iteration and each instance. My problem occurs due to the combinatorial space.For importance sampling, the requirement for proposal distribution, q, is:

support (p) ⊂ support (q) (*)

For PMC, what is the support of the proposal distribution in iteration t? is it

support (p) ⊂ U support(q

_{i,t}) (**)or does (*) apply to every q

_{i,t}?For continuous problem, this is not a big issue. We can use random walk of Normal distribution to do local move satisfying (*). But for combination search, local moving only result in finite states choice, just not satisfying (*). For example for a permutation (1,3,2,4), random swap has only choose(4,2)=6 neighbor states.

**F**airly interesting question about population Monte Carlo (PMC), a sequential version of importance sampling we worked on with French colleagues in the early 2000’s. (The name population Monte Carlo comes from Iba, 2000.) While MCMC samplers do not have to cover the whole support of p at each iteration, it is much harder for importance samplers as their core justification is to provide an unbiased estimator to for all integrals of interest. Thus, when using the PMC estimate,

1/n ∑_{i,t} {p(x_{i,t})/q_{i,t}(x_{i,t})}h(q_{i,t}), x_{i,t~}q_{i,t(x})

this estimator is only unbiased when the supports of the q_{i,t }“s are all containing the support of p. The only other cases I can think of are

- associating the q
_{i,t }“s with a partition S_{i,t}of the support of p and using instead∑

_{i,t}{p(x_{i,t})/q_{i,t}(x_{i,t})}h(q_{i,t}), x_{i,t~}q_{i,t(x}) - resorting to AMIS under the assumption (**) and using instead
1/n ∑

_{i,t}{p(x_{i,t})/∑_{j,t}q_{j,t}(x_{i,t})}h(q_{i,t}), x_{i,t~}q_{i,t(x})

but I am open to further suggestions!

## my week at War[wick]

Posted in pictures, Running, Statistics, Travel, Uncategorized with tags ABC, AMIS, Bayesian asymptotics, COLT2014, empirical Bayes methods, empirical likelihood, MASDOC, University of Warwick, Warwickshire, Zeeman building on February 1, 2014 by xi'an**T**his was a most busy and profitable week in Warwick as, in addition to meeting with local researchers and students on a wide range of questions and projects, giving an extended seminar to MASDOC students, attending as many seminars as humanly possible (!), and preparing a 5k race by running in the Warwickshire countryside (in the dark and in the rain), I received the visits of Kerrie Mengersen, Judith Rousseau and Jean-Michel Marin, with whom I made some progress on papers we are writing together. In particular, Jean-Michel and I wrote the skeleton of a paper we (still) plan to submit to COLT 2014 next week. And Judith, Kerrie and I drafted new if paradoxical aconnections between empirical likelihood and model selection. Jean-Michel and Judith also gave talks at the CRiSM seminar, Jean-Michel presenting the latest developments on the convergence of our AMIS algorithm, Judith summarising several papers on the analysis of empirical Bayes methods in non-parametric settings.

## MCMSki IV [day 2.5]

Posted in Mountains, pictures, Statistics, University life with tags ABC, AMIS, extremes, parallelisation, poster session, raclette, SNPs, sticky Metropolis, synthetic likelihood, warhammer on January 8, 2014 by xi'an**D**espite a good rest during the ski break, my cold did not get away (no magic left in this world!) and I thus had a low attention span to attend the *Bayesian statistics and Population genetics* session: while Jukka Corander mentioned the improvement brought by our AMIS algorithm, I had difficulties getting the nature of the model, if only because he used a blackboard-like font that made math symbols too tiny to read. (Nice fonts, otherwise!), Daniel Lawson (of vomiting Warhammer fame!) talked about the alluring notion of a statistical emulator, and Barbara Engelhardt talked about variable selection in a SNP setting. I did not get a feeling on how handling ten millions of SNPs was possible in towards a variable selection goal. My final session of the day was actually “my” invited session on ABC methods, where Richard Everitt presented a way of mixing exact approximation with ABC and synthetic likelihood (Wood, *Nature*) approximations. The resulting MAVIS algorithm is not out yet. The second speaker was Ollie Ratman, who spoke on his accurate ABC that I have discussed many times here. And Jean-Michel Marin managed to drive from Montpelier, just in time to deliver his talk on our various explorations of the ABC model choice problem.

**A**fter a quick raclette at “home”, we headed back to the second poster session, where I had enough of a clear mind and not too much of a headache (!) to have several interesting discussions, incl. a new parallelisation suggested by Ben Calderhead, the sticky Metropolis algorithm of Luca Martino, the airport management video of Jegar Pitchforth, the mixture of Dirichlet distributions for extremes by Anne Sabourin, not mentioning posters from Warwick or Paris. At the end of the evening I walked back to my apartment with the Blossom skis we had brought in the morning to attract registrations for the ski race: not enough to make up for the amount charged by the ski school. Too bad, especially given Anto’s efforts to get this amazing sponsoring!

## Initializing adaptive importance sampling with Markov chains

Posted in Statistics with tags AMIS, arXiv, cosmoPMC, evidence, Kullback, marginal likelihood, Multinest, nested sampling, PMC, population Monte Carlo, sequential Monte Carlo, simulation on May 6, 2013 by xi'an**A**nother paper recently arXived by Beaujean and Caldwell elaborated on our population Monte Carlo papers (Cappé et al., 2005, Douc et al., 2007, Wraith et al., 2010) to design a more thorough starting distribution. Interestingly, the authors mention the fact that PMC is an EM-type algorithm to emphasize the importance of the starting distribution, as with “poor proposal, PMC fails as proposal updates lead to a consecutively poorer approximation of the target” (p.2). I had not thought of this possible feature of PMC, which indeed proceeds along integrated EM steps, and thus could converge to a local optimum (if not poorer than the start as the Kullback-Leibler divergence decreases).

**T**he solution proposed in this paper is similar to the one we developed in our AMIS paper. An important part of the simulation is dedicated to the construction of the starting distribution, which is a mixture deduced from multiple Metropolis-Hastings runs. I find the method spends an unnecessary long time on refining this mixture by culling the number of components: down-the-shelf clustering techniques should be sufficient, esp. if one considers that *the value of the target is available at every simulated point*. This has been my pet (if idle) theory for a long while: we do not take (enough) advantage of this informative feature in our simulation methods… I also find the *Student’s t versus Gaussian kernel* debate (p.6) somehow superfluous: as we shown in Douc et al., 2007, we can process Student’s *t* distributions so we can as well work with those. And rather worry about the homogeneity assumption this choice implies: working with any elliptically symmetric kernel assumes a local Euclidean structure on the parameter space, for all components, and does not model properly highly curved spaces. Another pet theory of mine’s. As for picking the necessary number of simulations at each PMC iteration, I would add to the ESS and the survival rate of the components a measure of the Kullback-Leibler divergence, as it *should decrease* at each iteration (with an infinite number of particles).

**A**nother interesting feature is in the comparison with Multinest, the current version of nested sampling, developed by Farhan Feroz. This is the second time I read a paper involving nested sampling in the past two days. While this PMC implementation does better than nested sampling on the examples processed in the paper, the Multinest outcome remains relevant, particularly because it handles multi-modality fairly well. The authors seem to think parallelisation is an issue with nested sampling, while I do see why: at the most naïve stage, several nested samplers can be run in parallel and the outcomes pulled together.

## AMIS convergence, at last!

Posted in Statistics, University life with tags AMIS, Big'MC, convergence, importance sampling, PMC, seminar, unbiasedness on November 19, 2012 by xi'an**T**his afternoon, Jean-Michel Marin gave his talk at the big’MC seminar. As already posted, it was about a convergence proof for AMIS, which gave me the opportunity to simultaneously read the paper and listen to the author. The core idea for adapting AMIS towards a manageable version is to update the proposal parameter based on the current sample rather than on the whole past. This facilitates the task of establishing convergence to the optimal (pseudo-true) value of the parameter, under an assumption that the optimal value is a know moment of the target. From there, convergence of the weighted mean is somehow natural when the number of simulations grows to infinity. (Note the special asymptotics of AMIS, though, which are that the number of steps goes to infinity while the number of simulations per step grows a wee faster than linearly. In this respect, it is the opposite of PMC, where convergence is of a more traditional nature, pushing the number of simulations per step to infinity.) The second part of the convergence proof is more intricate, as it establishes that the multiple mixture estimator based on the “forward-backward” reweighting of all simulations since step zero does converge to the proper posterior moment. This relies on rather complex assumptions, but remains a magnificent *tour de force*. During the talk, I wondered if, given the Markovian nature of the algorithm (since reweighting only occurs once simulation is over), an alternative estimator based on the optimal value of the simulation parameter would not be better than the original multiple mixture estimator: the proof is based on the equivalence between both versions….