Archive for AMIS

PMC for combinatoric spaces

Posted in Statistics, University life with tags , , , , , , , on July 28, 2014 by xi'an

I received this interesting [edited] email from Xiannian Fan at CUNY:

I am trying to use PMC to solve Bayesian network structure learning problem (which is in a combinatorial space, not continuous space).

In PMC, the proposal distributions qi,t can be very flexible, even specific to each iteration and each instance. My problem occurs due to the combinatorial space.

For importance sampling, the requirement for proposal distribution, q, is:

support (p) ⊂ support (q)             (*)

For PMC, what is the support of the proposal distribution in iteration t? is it

support (p) ⊂ U support(qi,t)    (**)

or does (*) apply to every qi,t?

For continuous problem, this is not a big issue. We can use random walk of Normal distribution to do local move satisfying (*). But for combination search, local moving only result in finite states choice, just not satisfying (*). For example for a permutation (1,3,2,4), random swap has only choose(4,2)=6 neighbor states.

Fairly interesting question about population Monte Carlo (PMC), a sequential version of importance sampling we worked on with French colleagues in the early 2000’s.  (The name population Monte Carlo comes from Iba, 2000.)  While MCMC samplers do not have to cover the whole support of p at each iteration, it is much harder for importance samplers as their core justification is to provide an unbiased estimator to for all integrals of interest. Thus, when using the PMC estimate,

1/n ∑i,t {p(xi,t)/qi,t(xi,t)}h(qi,t),  xi,t~qi,t(x)

this estimator is only unbiased when the supports of the qi,t “s are all containing the support of p. The only other cases I can think of are

  1. associating the qi,t “s with a partition Si,t of the support of p and using instead

    i,t {p(xi,t)/qi,t(xi,t)}h(qi,t), xi,t~qi,t(x)

  2. resorting to AMIS under the assumption (**) and using instead

    1/n ∑i,t {p(xi,t)/∑j,t qj,t(xi,t)}h(qi,t), xi,t~qi,t(x)

but I am open to further suggestions!

my week at War[wick]

Posted in pictures, Running, Statistics, Travel, Uncategorized with tags , , , , , , , , , on February 1, 2014 by xi'an

This was a most busy and profitable week in Warwick as, in addition to meeting with local researchers and students on a wide range of questions and projects, giving an extended seminar to MASDOC students, attending as many seminars as humanly possible (!), and preparing a 5k race by running in the Warwickshire countryside (in the dark and in the rain), I received the visits of Kerrie Mengersen, Judith Rousseau and Jean-Michel Marin, with whom I made some progress on papers we are writing together. In particular, Jean-Michel and I wrote the skeleton of a paper we (still) plan to submit to COLT 2014 next week. And Judith, Kerrie and I drafted new if paradoxical aconnections between empirical likelihood and model selection. Jean-Michel and Judith also gave talks at the CRiSM seminar, Jean-Michel presenting the latest developments on the convergence of our AMIS algorithm, Judith summarising several papers on the analysis of empirical Bayes methods in non-parametric settings.

MCMSki IV [day 2.5]

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , on January 8, 2014 by xi'an

ridge4Despite a good rest during the ski break, my cold did not get away (no magic left in this world!) and I thus had a low attention span to attend the Bayesian statistics and Population genetics session: while Jukka Corander mentioned the improvement brought by our AMIS algorithm, I had difficulties getting the nature of the model, if only because he used a blackboard-like font that made math symbols too tiny to read. (Nice fonts, otherwise!), Daniel Lawson (of vomiting Warhammer fame!) talked about the alluring notion of a statistical emulator, and Barbara Engelhardt talked about variable selection in a SNP setting. I did not get a feeling on how handling ten millions of SNPs was possible in towards a variable selection goal.  My final session of the day was actually “my” invited session on ABC methods, where Richard Everitt presented a way of mixing exact approximation with ABC and synthetic likelihood (Wood, Nature) approximations. The resulting MAVIS algorithm is  not out yet. The second speaker was Ollie Ratman, who spoke on his accurate ABC that I have discussed many times here. And Jean-Michel Marin managed to drive from Montpelier, just in time to deliver his talk on our various explorations of the ABC model choice problem.

After a quick raclette at “home”, we headed back to the second poster session, where I had enough of a clear mind and not too much of a headache (!) to have several interesting discussions, incl. a new parallelisation suggested  by Ben Calderhead, the sticky Metropolis algorithm of Luca Martino, the airport management video of Jegar Pitchforth, the mixture of Dirichlet distributions for extremes by Anne Sabourin, not mentioning posters from Warwick or Paris. At the end of the evening  I walked back to my apartment with the Blossom skis we had brought in the morning to attract registrations for the ski race: not enough to make up for the amount charged by the ski school. Too bad, especially given Anto’s efforts to get this amazing sponsoring!

Initializing adaptive importance sampling with Markov chains

Posted in Statistics with tags , , , , , , , , , , , on May 6, 2013 by xi'an

Another paper recently arXived by Beaujean and Caldwell elaborated on our population Monte Carlo papers (Cappé et al., 2005, Douc et al., 2007, Wraith et al., 2010) to design a more thorough starting distribution. Interestingly, the authors mention the fact that PMC is an EM-type algorithm to emphasize the importance of the starting distribution, as with “poor proposal, PMC fails as proposal updates lead to a consecutively poorer approximation of the target” (p.2). I had not thought of this possible feature of PMC, which indeed proceeds along integrated EM steps, and thus could converge to a local optimum (if not poorer than the start as the Kullback-Leibler divergence decreases).

The solution proposed in this paper is similar to the one we developed in our AMIS paper. An important part of the simulation is dedicated to the construction of the starting distribution, which is a mixture deduced from multiple Metropolis-Hastings runs. I find the method spends an unnecessary long time on refining this mixture by culling the number of components: down-the-shelf clustering techniques should be sufficient, esp. if one considers that the value of the target is available at every simulated point. This has been my pet (if idle) theory for a long while: we do not take (enough) advantage of this informative feature in our simulation methods… I also find the Student’s t versus Gaussian kernel debate (p.6) somehow superfluous: as we shown in Douc et al., 2007, we can process Student’s t distributions so we can as well work with those. And rather worry about the homogeneity assumption this choice implies: working with any elliptically symmetric kernel assumes a local Euclidean structure on the parameter space, for all components, and does not model properly highly curved spaces. Another pet theory of mine’s. As for picking the necessary number of simulations at each PMC iteration, I would add to the ESS and the survival rate of the components a measure of the Kullback-Leibler divergence, as it should decrease at each iteration (with an infinite number of particles).

Another interesting feature is in the comparison with Multinest, the current version of nested sampling, developed by Farhan Feroz. This is the second time I read a paper involving nested sampling in the past two days. While this PMC implementation does better than nested sampling on the examples processed in the paper, the Multinest outcome remains relevant, particularly because it handles multi-modality fairly well. The authors seem to think parallelisation is an issue with nested sampling, while I do see why: at the most naïve stage, several nested samplers can be run in parallel and the outcomes pulled together.

AMIS convergence, at last!

Posted in Statistics, University life with tags , , , , , , on November 19, 2012 by xi'an

This afternoon, Jean-Michel Marin gave his talk at the big’MC seminar. As already posted, it was about a convergence proof for AMIS, which gave me the opportunity to simultaneously read the paper and listen to the author. The core idea for adapting AMIS towards a manageable version is to update the proposal parameter based on the current sample rather than on the whole past. This facilitates the task of establishing convergence to the optimal (pseudo-true) value of the parameter, under an assumption that the optimal value is a know moment of the target. From there, convergence of the weighted mean is somehow natural when the number of simulations grows to infinity. (Note the special asymptotics of AMIS, though, which are that the number of steps goes to infinity while the number of simulations per step grows a wee faster than linearly. In this respect, it is the opposite of PMC, where convergence is of a more traditional nature, pushing the number of simulations per step to infinity.) The second part of the convergence proof is more intricate, as it establishes that the multiple mixture estimator based on the “forward-backward” reweighting of all simulations since step zero does converge to the proper posterior moment. This relies on rather complex assumptions, but remains a magnificent tour de force. During the talk, I wondered if, given the Markovian nature of the algorithm (since reweighting only occurs once simulation is over), an alternative estimator based on the optimal value of the simulation parameter would not be better than the original multiple mixture estimator: the proof is based on the equivalence between both versions….


Get every new post delivered to your Inbox.

Join 603 other followers