Archive for GPU

Foundations of Statistical Algorithms [book review]

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , on February 28, 2014 by xi'an

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by modern statistical analysis. In short, it provides the necessary skills to construct statistical algorithms and hence to contribute to statistical computing. And I wish I had the luxury to teach from Foundations of Statistical Algorithms to my graduate students, if only we could afford an extra yearly course…

“Our aim is to enable the reader (…) to quickly understand the main ideas of modern numerical algorithms [rather] than having to memorize the current, and soon to be outdated, set of popular algorithms from computational statistics.”(p.1)

The book is built around the above aim, first presenting the reasons why computers can produce answers different from what we want, using least squares as a mean to check for (in)stability, then second establishing the ground forFishman Monte Carlo methods by discussing (pseudo-)random generation, including MCMC algorithms, before moving in third to bootstrap and resampling techniques, and  concluding with parallelisation and scalability. The text is highly structured, with frequent summaries, a division of chapters all the way down to sub-sub-sub-sections, an R implementation section in each chapter, and a few exercises. Continue reading

resampling and [GPU] parallelism

Posted in Statistics, University life with tags , , , , , , on March 13, 2012 by xi'an

In a recent note posted on arXiv, Lawrence Murray compares the implementation of resampling schemes for parallel systems like GPUs. Given a system of weighted particles, (xii), there are several ways of drawing a sample according to those weights:

  1. regular multinomial resampling, where each point in the (new) sample is one of the (xii), with probability (xii), meaning there is a uniform generated for each point;
  2. stratified resampling, where the weights are added, divided into equal pieces and a uniform is sampled on each piece, which means that points with large weights are sampled at least once and those with small weights at most once;
  3. systematic resampling, which is the same as the above except that the same uniform is used for each piece,
  4. Metropolis resampling, where a Markov chain converges to the distribution (ω1,…, ωP) on {1,…,P},

The three first resamplers are common in the particle system literature (incl. Nicolas Chopin’s PhD thesis), but difficult to adapt to GPUs (and I always feel uncomfortable with the fact that systematic uses a single uniform!), while the last one is more unusual, but actually well-fitted for a parallel implementation. While Lawrence Murray suggests using Raftery and Lewis’ (1992) assessment of the required number of Metropolis iterations to “achieve convergence”, I would instead suggest taking advantage of the toric nature of the space (as represented above) to run a random walk and wait for the equivalent of a complete cycle. In any case, this is a cool illustration of the new challenges posed by parallel implementations (like the development of proper random generators).

GPUs in computational statistics

Posted in pictures, Statistics, Travel with tags , , , , , , , on January 27, 2012 by xi'an

The workshop in Warwick yesterday went on very quickly! The room was packed. The three first talks were by myself, Christophe and Pierre, so less about GPUs and more about simulation techniques which could benefit from or even require implementation on GPUS. (I did manage to have complete slides this time… More seriously, Christophe’s talk set me thinking on the issue of estimating the likelihood function in ways different (?) from the one used in ABC.) The second half was more novel for me, in that the three talks went into the computing and computer aspects of GPUS, with Chris Holmes doing sparse [Lasso-like] regression on a scale otherwise [i.e. w/o GPUs] impossible, Chris [fourth Chris in the list of speakers!] Barnes explaining ABC for molecular biology and design (a point I plan to discuss on a later post), with even more details about the architecture and programming of GPUs, and Michael Stumpf delivering a grand finale, with essentially three talks into one: network analysis (incl. terrific movie bits incorporated within the beamer slides!), GPUs vs. CPUs and older alternatives, and random generators on GPU, commenting on a recent paper by Salmon et al. (SC, 2011) and showing that true gains in efficiency from using GPUs involved a heavy involvement into the hardware structure… A very exciting day followed by Stilton cheese degustation and haggis (if not poems) to celebrate Burns’ night!

Some hae meat and canna eat,
And some wad eat that want it;
But we hae meat, and we can eat,
And sae let the Lord be thankit.

English trip (1)

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , on January 25, 2012 by xi'an

Today, I am attending a workshop on the use of graphics processing units in Statistics in Warwick, supported by CRiSM, presenting our recent works with Randal Douc, Pierre Jacob and Murray Smith. (I will use the same slides as in Telecom two months ago, hopefully avoiding the loss of integral and summation signs this time!) Pierre Jacob will talk about Wang-Landau.

Then, tomorrow, I am off to Cambridge to talk about ABC and model choice on Friday afternoon. (Presumably using the same slides as in Provo.)

The (1) in the title is in prevision of a second trip to Oxford next month and another one to Bristol two months after! (The trip to Edinburgh does not count of course, since it is in Scotland!)

GPUs in Computational Statistics [Warwick, Jan. 25]

Posted in Statistics, Travel, University life with tags , , , , , , , on January 6, 2012 by xi'an

Next January 25, I will take part in a workshop at the University of Warwick, (organised by CRiSM and CSC) on the theme of GPUs in Computational Statistics. Even though I have not directly worked on GPUs, I will talk about our joint work with Pierre Jacob and Murray Smith.  While Pierre will talk about Parallel Wang-Landau. From there I will travel to Cambridge for a seminar on ABC model choice the next Friday.

Parallel processing of independent Metropolis-Hastings algorithms

Posted in R, Statistics, University life with tags , , , , , , , , on October 12, 2010 by xi'an

With Pierre Jacob, my PhD student, and Murray Smith, from National Institute of Water and Atmospheric Research, Wellington, who actually started us on this project at the last and latest Valencia meeting, we have completed a paper on using parallel computing in independent Metropolis-Hastings algorithms. The paper is arXived and the abstract goes as follows:

In this paper, we consider the implications of the fact that parallel raw-power can be exploited by a generic Metropolis–Hastings algorithm if the proposed values are independent. In particular, we present improvements to the independent Metropolis–Hastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those improvements are based on a fixed number of target density evaluations. Furthermore, the techniques developed in this paper do not jeopardize the Markovian convergence properties of the algorithm, since they are based on the Rao–Blackwell principles of Gelfand and Smith (1990), already exploited in Casella and Robert 91996), Atchadé and Perron (2005) and Douc and Robert (2010). We illustrate those improvement both on a toy normal example and on a classical probit regression model but insist on the fact that they are universally applicable.

I am quite excited about the results in this paper, which took advantage of (a) older works of mine on Rao-Blackwellisation, (b) Murray’s interests in costly likelihoods, and (c) our mutual excitement when hearing about GPU parallel possibilities from Chris Holmes’ talk in Valencia. (As well as directions drafted in an exciting session in Vancouver!) The (free) gains over standard independent Metropolis-Hastings estimates are equivalent to using importance sampling gains, while keeping the Markov structure of the original chain. Given that 100 or more parallel threads can be enhanced from current GPU cards, this is clearly a field with much potential! The graph below

gives the variance improvements brought by three Rao-Blackwell estimates taking advantage of parallelisation over the initial MCMC estimate (first entry) with the importance sampling estimate (last entry) using only 10 parallel threads.

JSM 2010 [day 1]

Posted in R, Statistics, University life with tags , , , , , , , , , , on August 2, 2010 by xi'an

The first day at JSM is always a bit sluggish, as people slowly drip in and get their bearings. Similar to last year in Washington D.C., the meeting takes place in a huge conference centre and thus there is no feeling of overcrowded [so far]. It may also be that the peripheric and foreign location of the meeting put some regular attendees off (not to mention the expensive living costs!).

Nonetheless, the Sunday afternoon sessions started with a highly interesting How Fast Can We Compute? How Fast Will We Compute? session organised by Mike West and featuring Steve Scott, Mark Suchard and Qanli Wang. The topic was on parallel processing, either via multiple processors or via GPUS, the later relating to the exciting talk Chris Holmes gave at the Valencia meeting. Steve showed us some code in order to explain how feasible the jump to parallel programming—a point demonstrated by Julien Cornebise and Pierre Jacob after they returned from Valencia—was, while stressing the fact that a lot of the processing in MCMC runs was opened to parallelisation. For instance, data augmentation schemes can allocate the missing data in a parallel way in most problems and the same for independent data likelihood computation. Marc Suchard focussed on GPUs and phylogenetic trees, both of high interest to me!, and he stressed the huge gains—of the order of hundreds in the decrease in computing time—made possible by the exploitation of laptop [Macbook] GPUs. (If I got his example correctly, he seemed to be doing an exact computation of the phylogeny likelihood, not an ABC approximation… Which is quite interesting, if potentially killing one of my main areas of research!) Qanli Wang linked both previous with the example of mixtures with a huge number of components. Plenty of food for thought.

I completed the afternoon session with the Student Paper Competition: Bayesian Nonparametric and Semiparametric Methods which was discouragingly empty of participants, with two of the five speakers missing and less than twenty people in the room. (I did not get the point about the competition as to who was ranking those papers. Not the participants apparently!)

Follow

Get every new post delivered to your Inbox.

Join 717 other followers