Archive for horseshoe prior

GPU-accelerated Gibbs sampling

Posted in Statistics, Travel, University life with tags , , , , , , on August 18, 2016 by xi'an

Alex Terenin told me during the welcoming reception of MCqMC 2016 that he, along with Shawfeng Dong and David Draper, had arXived a paper on GPU implementation of the Gibbs sampler and thanked me profusely for my accept-reject algorithm of the truncated normal distribution. Algorithm that he reprogrammed in CUDA. The paper is mostly a review on the specifics of GPU programming and of the constraints when compared with CPUs.  The type of models considered therein allows for GPU implementation because of a very large number of latent variables that are independent conditional on the parameter θ. Like, e.g., the horseshoe probit regression model, which is how my sampler enters the picture. Accept-reject algorithms are not ideally suited for GPUs because of the while not_accepted in the code, but I did not get [from our discussion] why it is more efficient to wait for the while loop to exit when compared with running more proposals and subset the accepted ones later. Presumably because this is too costly when ensuring at least one is accepted. The paper also mentions the issue of ensuring random generators remain valid when stretched across many threads, advocating block skips as discussed in an earlier (or even ancient) ‘Og post. In line with earlier comparison tests, the proper GPU implementation of the Gibbs sampler in this setting leads to improvements that are order of magnitude faster. Nonetheless, I wonder at the universality of the comparison in that GPUs lack the programming interface that is now available for CPUs. Some authors, like the current ones, have been putting some effort in constructing random generators in CUDA, but the entry cost for newbies like me still sounds overwhelming.

projective covariate selection

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on October 28, 2014 by xi'an

While I was in Warwick, Dan Simpson [newly arrived from Norway on a postdoc position] mentioned to me he had attended a talk by Aki Vehtari in Norway where my early work with Jérôme Dupuis on projective priors was used. He gave me the link to this paper by Peltola, Havulinna, Salomaa and Vehtari that indeed refers to the idea that a prior on a given Euclidean space defines priors by projections on all subspaces, despite the zero measure of all those subspaces. (This notion first appeared in a joint paper with my friend Costas Goutis, who alas died in a diving accident a few months later.) The projection further allowed for a simple expression of the Kullback-Leibler deviance between the corresponding models and for a Pythagorean theorem on the additivity of the deviances between embedded models. The weakest spot of this approach of ours was, in my opinion and unsurprisingly, about deciding when a submodel was too far from the full model. The lack of explanatory power introduced therein had no absolute scale and later discussions led me to think that the bound should depend on the sample size to ensure consistency. (The recent paper by Nott and Leng that was expanding on this projection has now appeared in CSDA.)

“Specifically, the models with subsets of covariates are found by maximizing the similarity of their predictions to this reference as proposed by Dupuis and Robert [12]. Notably, this approach does not require specifying priors for the submodels and one can instead focus on building a good reference model. Dupuis and Robert (2003) suggest choosing the size of the covariate subset based on an acceptable loss of explanatory power compared to the reference model. We examine using cross-validation based estimates of predictive performance as an alternative.” T. Peltola et al.

The paper also connects with the Bayesian Lasso literature, concluding on the horseshoe prior being more informative than the Laplace prior. It applies the selection approach to identify biomarkers with predictive performances in a study of diabetic patients. The authors rank model according to their (log) predictive density at the observed data, using cross-validation to avoid exploiting the data twice. On the MCMC front, the paper implements the NUTS version of HMC with STAN.