Archive for adaptivity

Bayesian adaptive sampling

Posted in R, Statistics with tags , , , , , , , , on December 6, 2010 by xi'an

In the continuation of my earlier post on computing evidence, I read a very interesting paper by Merlise Clyde, Joyee Ghosh and Michael Littman, to appear in JCGS. It is called  Bayesian adaptive sampling for variable selection and model averaging. The sound idea at the basis of the paper is that, when one is doing variable selection (i.e. exploring a finite if large state space) in setups where the individual probabilities of the models are known (up to a constant), it is not necessary to return to models that have been already visited. Hence the move to sample models without replacement called BAS (for Bayesian adaptive sampling) in the paper. The first part discusses the way to sample without replacement a state space whose elements and probabilities are defined by a binary tree, i.e.

f(\mathbf{\gamma})=\prod_{j=1}^k \rho_{j|<j}^{\gamma_j}(1-\rho_{j|<j})^{1-\gamma_j}

(The connection with variable selection is that each level of the tree corresponds to the binary choice between including and excluding one of the variables. The tree thus has 2k endpoints/leaves for k potential variables in the model.) The cost in updating the probabilities is actually in O(k) if k is the number of levels, instead of 2k because most of the branches of the tree are unaffected by setting one final branch to probability zero. The second part deals with the adaptive and approximative issues.

Continue reading

IMIS & AMIS

Posted in R, Statistics, University life with tags , , , , , , , , on July 30, 2010 by xi'an

A most interesting paper by Adrian Raftery and Le Bao appeared in the Early View section of Biometrics.  It aims at better predictions for HIV prevalence—in the original UNAIDS implementation, a naïve SIR procedure was used, based on the prior as importance function, which sometimes resulted in terrible degeneracy—, but its methodological input is about incremental mixture importance sampling (IMIS), thus relates to the general topic of adaptive Monte Carlo methods I am interested in. (And to some extent to our recent AMIS paper.) Actually, a less elaborate (and less related) version of the IMIS algorithm first appeared in a 2006 paper by Steele, Raftery and Edmond in JCGS in the setting of finite mixture likelihoods and I somehow managed to miss it…

Raftery and Bao propose to replace SIR with an iterative importance sampling technique developed in 2003 by Steele et al. that has some similarities with population Monte Carlo (PMC). (A negligible misrepresentation of PMC in the current paper is that our method does not use “the prior as importance function'”.) In its current format, the IMIS algorithm starts from a first guess (e.g., the prior distribution) and builds a sequence of Gaussian (or Gaussian mixture) approximations whose parameters are estimated from the current population, while all simulation are merged together at each step, using a mixture stabilising weight

\pi(\theta_i^s|x) / \omega_0 p_0(\theta_i^0)+\sum_r \omega_r \hat q_r(\theta_i^s)

where the weights \omega_r depend on the number of simulations at step r. This pattern also appears in our adaptive multiple importance sampling (AMIS) algorithm developed in this arXiv paper with Jean-Marie Cornuet, Jean-Michel Marin and Antonietta Mira, and in the original paper by Owen and Zhou (2000, JASA) that inspired us. Raftery and Bo extend the methodology to an IMIS with optimisation at the initial stage, while AMIS incorporates the natural population Monte Carlo stepwise optimisation developed in Douc et al. (2008, Annals of Statistics) that brings the proposal kernel closer to the target after each iteration. The application of the simulations to conduct model choice found in the current paper and in Steele et al. can also be paralleled with the one using population Monte Carlo we conducted for cosmological data in MNRAS.

Interestingly, Raftery and Bo (and also Steele et al.) refer to the defensive mixture paper of Hesterberg (1995, Technometrics), which has been very influential in my research on importance sampling, and (less directly) to Owen and Zhou (2000, JASA), who did propose the deterministic mixture scheme that inspired AMIS. Besides the foundational papers of Oh and Berger (1991, JASA) and West (1993, J. Royal Statistical Society Series B), they also mention a paper by Raghavan and Cox (1998, J. Statistical Simulation & Computation) I was not aware of, which introduces as well a mixture of importance proposals as a variance stabilising technique.

Another ABC paper

Posted in Statistics with tags , , , , , , , on July 24, 2010 by xi'an

“One aim is to extend the approach of Sisson et al. (2007) to provide an algorithm that is robust to implement.”

C.C. Drovandi & A.N. Pettitt

A paper by Drovandi and Pettit appeared in the Early View section of Biometrics. It uses a combination of particles and of MCMC moves to adapt to the true target, with an acceptance probability

\min\left\{1,\dfrac{\pi(\theta^*)q(\theta_c|\theta^*)}{\pi(\theta^*)q(\theta^*|\theta_c)}\right\}

where \theta^* is the proposed value and \theta_c is the current value (picked at random from the particle population), while q is a proposal kernel used to simulate the proposed value. The algorithm is adaptive in that the previous population of particles is used to make the choice of the proposal q, as well as of the tolerance level \epsilon_t. Although the method is valid as a particle system applied in the ABC setting, I have difficulties to gauge the level of novelty of the method (then applied to a model of Riley et al., 2003, J. Theoretical Biology). Learning from previous particle populations to build a better kernel q is indeed a constant feature in SMC methods, from Sisson et al.’s ABC-PRC (2007)—note that Drovandi and Pettitt mistakenly believe the ABC-PRC method to include partial rejection control, as argued in this earlier post—, to Beaumont et al.’s ABC-PMC (2009). The paper also advances the idea of adapting the tolerance on-line as an \alpha quantile of the previous particle population, but this is the same idea as in Del Moral et al.’s ABC-SMC. The only strong methodological difference, as far as I can tell, is that the MCMC steps are repeated “numerous times” in the current paper, instead of once as in the earlier papers. This however partly cancels the appeal of an O(N) order method versus the O() order PMC and SMC methods. An interesting remark made in the paper is that more advances are needed in cases when simulating the pseudo-observations is highly costly, as in Ising models. However, replacing exact simulation [as we did in the model choice paper] with a Gibbs sampler cannot be that detrimental.

Bayesian model comparison in cosmology with population Monte Carlo

Posted in Statistics with tags , , , , , , , , on December 11, 2009 by xi'an

The paper on evidence approximation by population Monte Carlo that I mentioned in a previous post is now arXived. (This is the second paper jointly produced by the members of the 2005-2009 ANR Ecosstat.) The full abstract is

“We use Bayesian model selection techniques to test extensions of the standard  flat ACDM paradigm. Dark-energy and curvature scenarios, and primordial perturbation models are considered. To that end, we calculate the Bayesian evidence in favour of each model using Population Monte Carlo (PMC), a new adaptive sampling technique which was recently applied in a cosmological context. In contrast to the case of other sampling-based inference techniques such as Markov chain Monte Carlo (MCMC), the Bayesian evidence is immediately available from the PMC sample used for parameter estimation without further computational eff ort, and it comes with an associated error evaluation. Besides, it provides an unbiased estimator of the evidence after any fixed number of iterations and it is naturally parallelisable, in contrast with MCMC and nested sampling methods. By comparison with analytical predictions for simulated data, we show that our results obtained with PMC are reliable and robust. The variability in the evidence evaluation and the stability for various cases are estimated both from simulations and from data. For the cases we consider, the log-evidence is calculated with a precision of better than 0:08.

Using a combined set of recent CMB, SNIa and BAO data, we fi nd inconclusive evidence between  at ACDM and simple dark-energy models. A curved Universe is moderately to strongly disfavoured with respect to a flat cosmology. Using physically well-motivated priors within the slow-roll approximation of inflation, we find a weak preference for a running spectral index. A Harrison-Zel’dovich spectrum is weakly disfavoured. With the current data, tensor modes are not detected; the large prior volume on the tensor-to-scalar ratio r results in moderate evidence in favour of r = 0.”

and we run an evaluation of the PMC based approximation to the evidence that shows that this particular importance sampling technique allows for an associated evalution of the uncertainty surrounding this approximation. A posteriori, I think we should have included a comparison with nested sampling, since this technique is now very much in use in astronomy, but this will have to wait for another paper.

Note that the paper details the fact that the variance of the evidence approximation

\dfrac{\mathfrak{E}^2}{N} d(\pi,q)

is connected with the chi-square distance d(\pi,q), which can indeed be estimated from the simulation output as

\dfrac{N}{\text{ESS}_N} -1

if \text{ESS}_N denotes the effective sample size. In the cosmology section, a point worth pointing to non-cosmologists is that the likelihood computation takes ages and therefore that the complete parallelisation offered by importance sampling is quite appreciable.

Adaptive scaling Metropolis

Posted in Statistics with tags , , on March 27, 2009 by xi'an

Matti Vihola from the University of Jyväskylä posted a paper on arXiv yesterday on a convergence result for an adaptive scheme related with the basic random walk Metropolis-Hastings algorithm. The scale\thetaused in the random walk is adapted using the Robbins-Monro stochastic approximation schedule

\log \theta_{t+1} = \log\theta_t + Ct^{-\gamma}(\alpha_t-\alpha^\star)

where\alpha^\staris the gold standard for acceptance, and\alpha_tis the empirical acceptance rate. (This was one of the examples in our 2001 paper with Christophe Andrieu, Controlled MCMC for Optimal Sampling, that never got published but can nonetheless boasts about its 43 citations!) The constraints imposed upon the target density\piand on the integrandfare fairly harsh, includingfbeing bounded (those constraints seem to be more restrictive than in Roberts and Rosenthal, 2007, J.A.P., although of course I did not check the correspondance) and there is a surprising restriction on\alpha^\star, namely\alpha^\star<1/2. I do not have any intuitive explanation for this hard boundary on the acceptance rate: staying away from 0 and from 1 makes sense, obviously, but 1/2? Looking at the reason, this seems to be related to a lower bound on the average acceptance rate found in the proof of Proposition 12, which is itself related with the convergence of the Robbins-Monro sequence.

Ps-When checking for the role of\alpha, I came across the possibility to search for\alphain a pdf file with Acrobat Reader by simply typing\alphain the Find box. Neat!

Follow

Get every new post delivered to your Inbox.

Join 604 other followers