On Monday, James Johndrow, Aaron Smith, Natesh Pillai, and David Dunson arXived a paper on the diminishing benefits of using data augmentation for large and highly imbalanced categorical data. They reconsider the data augmentation scheme of Tanner and Wong (1987), surprisingly not mentioned, used in the first occurrences of the Gibbs sampler like Albert and Chib’s (1993) or our mixture estimation paper with Jean Diebolt (1990). The central difficulty with data augmentation is that the distribution to be simulated operates on a space that is of order O(n), even when the original distribution covers a single parameter. As illustrated by the coalescent in population genetics (and the subsequent intrusion of the ABC methodology), there are well-known cases when the completion is near to impossible and clearly inefficient (as again illustrated by the failure of importance sampling strategies on the coalescent). The paper provides spectral gaps for the logistic and probit regression completions, which are of order a power of log(n) divided by √n, when all observations are equal to one. In a somewhat related paper with Jim Hobert and Vivek Roy, we studied the spectral gap for mixtures with a small number of observations: I wonder at the existence of a similar result in this setting, when all observations stem from one component of the mixture, when all observations are one. The result in this paper is theoretically appealing, the more because the posteriors associated with such models are highly regular and very close to Gaussian (and hence not that challenging as argued by Chopin and Ridgway). And because the data augmentation algorithm is uniformly ergodic in this setting (as we established with Jean Diebolt and later explored with Richard Tweedie). As demonstrated in the experiment produced in the paper, when comparing with HMC and Metropolis-Hastings (same computing times?), which produce much higher effective sample sizes.
Archive for spectral gap
inefficiency of data augmentation for large samples
Posted in Books, pictures, Running, Statistics, Travel, University life with tags convergence of Gibbs samplers, Data augmentation, Gibbs sampling, Hamiltonian Monte Carlo, importance sampling, logit model, MCMC, Monte Carlo Statistical Methods, probit model, simulation, spectral gap on May 31, 2016 by xi'anJSM 2015 [day #4]
Posted in pictures, Running, Statistics, Travel, University life with tags ASA, bag of little bootstraps, consistency, harmonic mean estimator, JSM 2015, Langevin diffusion, Langevin MCMC algorithm, latent variable, marginal likelihood, MCMC, Monte Carlo Statistical Methods, MrBayes, philogenic trees, R.A. Fisher, Seattle, soectral clustering, spectral gap, STAN, University of Warwick on August 13, 2015 by xi'anMy first session today was Markov Chain Monte Carlo for Contemporary Statistical Applications with a heap of interesting directions in MCMC research! Now, without any possible bias (!), I would definitely nominate Murray Pollock (incidentally from Warwick) as the winner for best slides, funniest presentation, and most enjoyable accent! More seriously, the scalable Langevin algorithm he developed with Paul Fearnhead, Adam Johansen, and Gareth Roberts, is quite impressive in avoiding computing costly likelihoods. With of course caveats on which targets it applies to. Murali Haran showed a new proposal to handle high dimension random effect models by a projection trick that reduces the dimension. Natesh Pillai introduced us (or at least me!) to a spectral clustering that allowed for an automated partition of the target space, itself the starting point to his parallel MCMC algorithm. Quite exciting, even though I do not perceive partitions as an ideal solution to this problem. The final talk in the session was Galin Jones’ presentation of consistency results and conditions for multivariate quantities which is a surprisingly unexplored domain. MCMC is still alive and running!
The second MCMC session of the morning, Monte Carlo Methods Facing New Challenges in Statistics and Science, was equally diverse, with Lynn Kuo’s talk on the HAWK approach, where we discovered that harmonic mean estimators are still in use, e.g., in MrBayes software employed in phylogenetic inference. The proposal to replace this awful estimator that should never be seen again (!) was rather closely related to an earlier solution of us for marginal likelihood approximation, based there on a partition of the whole space rather than an HPD region in our case… Then, Michael Betancourt brilliantly acted as a proxy for Andrew to present the STAN language, with a flashy trailer he most recently designed. Featuring Andrew as the sole actor. And with great arguments for using it, including the potential to run expectation propagation (as a way of life). In fine, Faming Liang proposed a bootstrap subsampling version of the Metropolis-Hastings algorithm, where the likelihood acknowledging the resulting bias in the limiting distribution.
My first afternoon session was another entry on Statistical Phylogenetics, somewhat continued from yesterday’s session. Making me realised I had not seen a single talk on ABC for the entire meeting! The issues discussed in the session were linked with aligning sequences and comparing many trees. Again in settings where likelihoods can be computed more or less explicitly. Without any expertise in the matter, I wondered at a construction that would turn all trees, like into realizations of a continuous model. For instance by growing one branch at a time while removing the MRCA root… And maybe using a particle like method to grow trees. As an aside, Vladimir Minin told me yesterday night about genetic mutations that could switch on and off phenotypes repeatedly across generations… For instance the ability to glow in the dark for species of deep sea fish.
When stating that I did not see a single talk about ABC, I omitted Steve Fienberg’s Fisher Lecture R.A. Fisher and the Statistical ABCs, keeping the morceau de choix for the end! Even though of course Steve did not mention the algorithm! A was for asymptotics, or ancilarity, B for Bayesian (or biducial??), C for causation (or cuffiency???)… Among other germs, I appreciated that Steve mentioned my great-grand father Darmois in connection with exponential families! And the connection with Jon Wellner’s LeCam Lecture from a few days ago. And reminding us that Savage was a Fisher lecturer himself. And that Fisher introduced fiducial distributions quite early. And for defending the Bayesian perspective. Steve also set some challenges like asymptotics for networks, Bayesian model assessment (I liked the notion of stepping out of the model), and randomization when experimenting with networks. And for big data issues. And for personalized medicine, building on his cancer treatment. No trace of the ABC algorithm, obviously, but a wonderful Fisher’s lecture, also most obviously!! Bravo, Steve, keep thriving!!!
How quickly does randomness appear?
Posted in Statistics, University life with tags convergence, geometric ergodicity, La Recherche, Metropolis-Hastings algorithms, Nice, randomness, spectral gap, total variation, uniform ergodicity on November 10, 2011 by xi'anThis was the [slightly off-key] title of the math column in the November issue of La Recherche, in any case intriguing enough for me to buy this general public science magazine on the metro platform and to read it immediately while waiting for an uncertain train, thanks to the nth strike of the year on my train line… But this was the occasion for an exposition of the Metropolis algorithm in a general public journal! The column actually originated from a recently published paper by Persi Diaconis, Gilles Lebeaux, and Laurent Michel, Geometric analysis for the Metropolis algorithm on Lipschitz domain, in Inventiones Mathematicae [one of the top pure math journals]. The column in La Recherche described the Metropolis algorithm (labelled there a random walk on Markov chains!), alluded to the use of MCMC methods in statistics, told the genesis of the paper [namely the long-term invitation of Persi Diaconis in Nice a few years ago] and briefly explained the convergence result, namely the convergence of the Metropolis algorithm to the stationary measure at a geometric rate, with an application to the non-overlapping balls problem.
If you take a look at the paper, you will see it is a beautiful piece of mathematics, establishing a spectral gap on the Markov operator associated with the Metropolis algorithm and deducing a uniformly geometric convergence [in total variation] for most regular-and-bounded-support distributions. A far from trivial and fairly general result. La Recherche however fails to mention the whole corpus of MCMC convergence results obtained in the 1990’s and 2000’s, by many authors, incl. Richard Tweedie, Gareth Roberts, Jeff Rosenthal, Eric Moulines, Gersende Fort, Randal Douc, Kerrie Mengersen, and others…