**A**lex Terenin told me during the welcoming reception of MCqMC 2016 that he, along with Shawfeng Dong and David Draper, had arXived a paper on GPU implementation of the Gibbs sampler and thanked me profusely for my accept-reject algorithm of the truncated normal distribution. Algorithm that he reprogrammed in CUDA. The paper is mostly a review on the specifics of GPU programming and of the constraints when compared with CPUs. The type of models considered therein allows for GPU implementation because of a very large number of latent variables that are independent conditional on the parameter θ. Like, e.g., the horseshoe probit regression model, which is how my sampler enters the picture. Accept-reject algorithms are not ideally suited for GPUs because of the *while not_accepted* in the code, but I did not get [from our discussion] why it is more efficient to wait for the while loop to exit when compared with running more proposals and subset the accepted ones later. Presumably because this is too costly when ensuring at least one is accepted. The paper also mentions the issue of ensuring random generators remain valid when stretched across many threads, advocating block skips as discussed in an earlier (or even ancient) ‘Og post. In line with earlier comparison tests, the proper GPU implementation of the Gibbs sampler in this setting leads to improvements that are order of magnitude faster. Nonetheless, I wonder at the universality of the comparison in that GPUs lack the programming interface that is now available for CPUs. Some authors, like the current ones, have been putting some effort in constructing random generators in CUDA, but the entry cost for newbies like me still sounds overwhelming.

## Archive for pseudo-random generator

## GPU-accelerated Gibbs sampling

Posted in Statistics, Travel, University life with tags CPU, CUDA, GPU, horseshoe prior, parallel MCMC, probit model, pseudo-random generator on August 18, 2016 by xi'an## automatic variational ABC

Posted in pictures, Statistics with tags ABC, Amsterdam, beta distribution, bias, Kumaraswamy distribution, likelihood function estimator, likelihood-free methods, pseudo-random generator, qbeta, R, variational Bayes methods on July 8, 2016 by xi'an

“Stochastic Variational inference is an appealing alternative to the inefficient sampling approaches commonly used in ABC.”

Moreno et al. [including Ted Meeds and Max Welling] recently arXived a paper merging variational inference and ABC. The argument for turning variational is computational speedup. The traditional (in variational inference) divergence decomposition of the log-marginal likelihood is replaced by an ABC version, parameterised in terms of intrinsic generators (i.e., generators that do not depend on cyber-parameters, like the U(0,1) or the N(0,1) generators). Or simulation code in the authors’ terms. Which leads to the automatic aspect of the approach. In the paper the derivation of the gradient is indeed automated.

“One issue is that even assuming that the ABC likelihood is an unbiased estimator of the true likelihood (which it is not), taking the log introduces a bias, so that we now have a biased estimate of the lower bound and thus biased gradients.”

I wonder how much of an issue this is, since we consider the variational lower bound. To be optimised in terms of the parameters of the variational posterior. Indeed, the endpoint of the analysis is to provide an optimal variational approximation, which remains an approximation whether or not the likelihood estimator is unbiased. A more “severe” limitation may be in the inversion constraint, since it seems to eliminate Beta or Gamma distributions. (Even though calling qbeta(runif(1),a,b) definitely is achievable… And not rejected by a Kolmogorov-Smirnov test.)

Incidentally, I discovered through the paper the existence of the Kumaraswamy distribution, which main appeal seems to be the ability to produce a closed-form quantile function, while bearing some resemblance with the Beta distribution. (Another arXival by Baltasar Trancón y Widemann studies some connections between those, but does not tell how to select the parameters to optimise the similarity.)

## exam question

Posted in Kids, Statistics, University life with tags Bernoulli, Bernoulli factory, biased coin, cross validated, final exam, log-normal distribution, Monte Carlo methods, pseudo-random generator, random simulation, Université Paris Dauphine on June 24, 2016 by xi'an**A** question for my third year statistics exam that I borrowed from Cross Validated: no student even attempted to solve this question…!

And another one borrowed from the highly popular post on the random variable [almost] always smaller than its mean!

## ISBA 2016 [#4]

Posted in pictures, Running, Statistics, Travel with tags ABC, Arnold Zellner, Bayes factors, Bayesian model choice, consistency, Italy, Monte Carlo Statistical Methods, pseudo-random generator, random forests, Santa Margherita di Pula, Sardinia on June 17, 2016 by xi'an**A**s an organiser of the ABC session (along with Paul Fearnhead), I was already aware of most results behind the talks, but nonetheless got some new perspectives from the presentations. Ewan Cameron discussed a two-stage ABC where the first step is actually an indirect inference inference, which leads to a more efficient ABC step. With applications to epidemiology. Lu presented extensions of his work with Paul Fearnhead, incorporating regression correction à la Beaumont to demonstrate consistency and using defensive sampling to control importance sampling variance. (While we are working on a similar approach, I do not want to comment on the consistency part, but I missed how defensive sampling can operate in complex ABC settings, as it requires advanced knowledge on the target to be effective.) And Ted Meeds spoke about two directions for automatising ABC (as in the ABcruise), from incorporating the pseudo-random generator into the representation of the ABC target, to calling for deep learning advances. The inclusion of random generators in the transform is great, provided they can remain black boxes as otherwise they require recoding. (This differs from quasi-Monte Carlo ABC, which aims at reducing the variability due to sheer noise.) It took me a little while, but I eventually understood why Jan Haning saw this inclusion as a return to fiducial inference!

Merlise Clyde gave a wide-ranging plenary talk on (linear) model selection that looked at a large range of priors under the hat of generalised confluent hypergeometric priors over the mixing scale in Zellner’s g-prior. Some were consistent under one or both models, maybe even for misspecified models. Some parts paralleled my own talk on the foundations of Bayesian tests, no wonder since I mostly give a review before launching into a criticism of the Bayes factor. Since I think this may be a more productive perspective than trying to over-come the shortcomings of Bayes factors in weakly informative settings. Some comments at the end of Merlise’s talk were loosely connected to this view in that they called for an unitarian perspective [rather than adapting a prior to a specific inference problem] with decision-theoretic backup. Conveniently the next session was about priors and testing, obviously connected!, with Leo Knorr-Held considering g-priors for the Cox model, Kerrie Mengersen discussing priors for over-fitted mixtures and HMMs, and Dan Simpson entertaining us with his quest of a prior for a point process, eventually reaching PC priors.

## precision in MCMC

Posted in Books, R, Statistics, University life with tags aperiodicity, CNRS, dynamical systems, Images des Mathématiques, MCMC algorithms, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, pseudo-random generator, round-off error, slice sampler on January 14, 2016 by xi'anWhile browsing Images des Mathématiques, I came across this article *[in French]* that studies the impact of round-off errors on number representations in a dynamical system and checked how much this was the case for MCMC algorithms like the slice sampler (recycling some R code from Monte Carlo Statistical Methods). By simply adding a few signif(…,dig=n) in the original R code. And letting the precision n vary.

“…si on simule des trajectoires pendant des intervalles de temps très longs, trop longs par rapport à la précision numérique choisie, alors bien souvent, les résultats des simulations seront complètement différents de ce qui se passe en réalité…”

Rather unsurprisingly (!), using a small enough precision (like two digits on the first row) has a visible impact on the simulation of a truncated normal. Moving to three digits seems to be sufficient in this example… One thing this tiny experiment reminds me of is the lumpability property of Kemeny and Snell. A restriction on Markov chains for aggregated (or discretised) versions to be ergodic or even Markov. Also, in 2000, Laird Breyer, Gareth Roberts and Jeff Rosenthal wrote a Statistics and Probability Letters paper on the impact of round-off errors on geometric ergodicity. However, I presume [maybe foolishly!] that the result stated in the original paper, namely that there exists an infinite number of precision digits for which the dynamical system degenerates into a small region of the space does not hold for MCMC. Maybe foolishly so because the above statement means that running a dynamical system for “too” long given the chosen precision kills the intended stationary properties of the system. Which I interpret as getting non-ergodic behaviour when exceeding the period of the uniform generator. More or less.

## quantile functions: mileage may vary

Posted in Books, R, Statistics with tags Charlie Geyer, execution time, pseudo-random generator, R, random simulation, standard quantile functions, system.time on May 12, 2015 by xi'an**W**hen experimenting with various quantiles functions in R, I was shocked *[ok this is a bit excessive, let us say surprised]* by how widely the execution times would vary. To the point of blaming a completely different feature of R. Borrowing from Charlie Geyer’s webpage on the topic of probability distributions in R, here is a table for some standard distributions: I ran

u=runif(1e7) system.time(x<-qcauchy(u))

choosing an arbitrary parameter whenever needed.

Distribution | Function | Time |
---|---|---|

Cauchy | `qcauchy` |
2.2 |

Chi-Square | `qchisq` |
43.8 |

Exponential | `qexp` |
0.95 |

F | `qf` |
34.2 |

Gamma | `qgamma` |
37.2 |

Logistic | `qlogis` |
1.7 |

Log Normal | `qlnorm` |
2.2 |

Normal | `qnorm` |
1.4 |

Student t | `qt` |
31.7 |

Uniform | `qunif` |
0.86 |

Weibull | `qweibull` |
2.9 |

Of course, it does not mean much in that all the slow distributions (except for Weibull) are parameterised. Nonetheless, that a chi-square inversion take 50 times longer than a uniform inversion remains puzzling as to why it is not coded more efficiently. In particular, I was wondering why the chi-square inversion was slower than the Gamma inversion. Rerunning both inversions showed that they are equivalent:

> u=runif(1e7) > system.time(x<-qgamma(u,sha=1.5)) utilisateur système écoulé 21.534 0.016 21.532 > system.time(x<-qchisq(u,df=3)) utilisateur système écoulé 21.372 0.008 21.361

Which also shows how variable *system.time* can be.