inefficiency of data augmentation for large samples

On Monday, James Johndrow, Aaron Smith, Natesh Pillai, and David Dunson arXived a paper on the diminishing benefits of using data augmentation for large and highly imbalanced categorical data. They reconsider the data augmentation scheme of Tanner and Wong (1987), surprisingly not mentioned, used in the first occurrences of the Gibbs sampler like Albert and Chib’s (1993) or our mixture estimation paper with Jean Diebolt (1990). The central difficulty with data augmentation is that the distribution to be simulated operates on a space that is of order O(n), even when the original distribution covers a single parameter. As illustrated by the coalescent in population genetics (and the subsequent intrusion of the ABC methodology), there are well-known cases when the completion is near to impossible and clearly inefficient (as again illustrated by the failure of importance sampling strategies on the coalescent). The paper provides spectral gaps for the logistic and probit regression completions, which are of order a power of log(n) divided by √n, when all observations are equal to one. In a somewhat related paper with Jim Hobert and Vivek Roy, we studied the spectral gap for mixtures with a small number of observations: I wonder at the existence of a similar result in this setting, when all observations stem from one component of the mixture, when all observations are one. The result in this paper is theoretically appealing, the more because the posteriors associated with such models are highly regular and very close to Gaussian (and hence not that challenging as argued by Chopin and Ridgway). And because the data augmentation algorithm is uniformly ergodic in this setting (as we established with Jean Diebolt and later explored with Richard Tweedie). As demonstrated in the experiment produced in the paper, when comparing with HMC and Metropolis-Hastings (same computing times?), which produce much higher effective sample sizes.

This entry was posted on May 31, 2016 at 12:16 am and is filed under Books, pictures, Running, Statistics, Travel, University life with tags convergence of Gibbs samplers, Data augmentation, Gibbs sampling, Hamiltonian Monte Carlo, importance sampling, logit model, MCMC, Monte Carlo Statistical Methods, probit model, simulation, spectral gap. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “inefficiency of data augmentation for large samples”

Julia Says:
October 20, 2018 at 10:25 am

Can you please cite the reference(s) for your statement on coalescent models:

“As illustrated by the coalescent in population genetics (and the subsequent intrusion of the ABC methodology), there are well-known cases when the completion is near to impossible and clearly inefficient (as again illustrated by the failure of importance sampling strategies on the coalescent)” ?

Reply
- xi'an Says:
  October 21, 2018 at 6:45 am
  
  I am referring to the difficulties met by sophisticated importance schemes as those of Stephens and Donnelly (2000) in all but simple population evolution scenarios where weeks of simulation are necessary.
  
  Reply
jamesjohndrow Says:
May 31, 2016 at 9:28 pm

Xi’an, our thanks for the references and for your insightful comments about the paper. We will be sure to include these references when we post a new version. Radford, we agree that the intuition we give for the result is quite similar to the discussion in your slice sampling paper, and will add a mention and reference to that effect. Thanks again to you both.

Xi’an, in response to your question about computation time per scan, we didn’t formally consider that, though qualitatively it is similar for HMC, RWMH, and Gibbs in the simple models. For the hierarchical application, HMC is quite slow, but I think a lot of this has to do with the implementation (Stan) rather than anything fundamental about the algorithm. RWMH has similar comparable computational complexity per iteration to Gibbs in this case.

Reply
Radford Neal Says:
May 31, 2016 at 4:59 am

A related discussion is in my slice sampling paper from 2003, at http://projecteuclid.org/download/pdf_1/euclid.aos/1056562461

There I discuss the multiple-auxiliary-variable algorithm of Damien, Wakefield, and Walker, explaining why it will often be very slow compared to the use of a single auxiliary variable in slice sampling. See pages 711 and 761-763.

Reply

Xi'an's Og