Archive for parallelisation

Advances in scalable Bayesian computation [day #4]

Posted in Books, Mountains, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 7, 2014 by xi'an

polyptych painting within the TransCanada Pipeline Pavilion, Banff Centre, Banff, March 21, 2012Final day of our workshop Advances in Scalable Bayesian Computation already, since tomorrow morning is an open research time ½ day! Another “perfect day in paradise”, with the Banff Centre campus covered by a fine snow blanket, still falling…, and making work in an office of BIRS a dream-like moment.

Still looking for a daily theme, parallelisation could be the right candidate, even though other talks this week went into parallelisation issues, incl. Steve’s talk yesterday. Indeed, Anthony Lee gave a talk this morning on interactive sequential Monte Carlo, where he motivated the setting by a formal parallel structure. Then, Darren Wilkinson surveyed the parallelisation issues in Monte Carlo, MCMC, SMC and ABC settings, before arguing in favour of a functional language called Scala. (Neat entries to those topics can be found on Darren’s blog.) And in the afternoon session, Sylvia Frühwirth-Schnatter exposed her approach to the (embarrassingly) parallel problem, in the spirit of Steve’s , David Dunson’s and Scott’s (a paper posted on the day I arrived in Chamonix and hence I missed!). There was plenty to learn from that talk (do not miss the Yin-Yang moment at 25 mn!), but it also helped me to break a difficulty I had with the consensus Bayes representation for two weeks (more on that later!). And, even though Marc Suchard mostly talked about flu and trees in a very pleasant and broad talk, he also had a slide on parallelisation to fit the theme! Although unrelated with parallelism,  Nicolas Chopin’s talk was on sequential quasi-Monte Carlo algorithms: while I had heard previous versions of this talk in Chamonix and BigMC, I found it full of exciting stuff. And it clearly got the room truly puzzled by this possibility, in a positive way! Similarly, Alex Lenkoski spoke about extreme rain events in Norway with no trace of parallelism, but the general idea behind the examples was to question the notion of the calibrated Bayesian (with possible connections with the cut models).

This has been a wonderful week and I am sure the participants got as much as I did from the talks and the informal exchanges. Thanks to BIRS for the sponsorship and the superb organisation of the week (and to the Banff Centre for providing such a paradisical environment). I feel very privileged to have benefited from this support, even though I deadly hope to be back in Banff within a few years.

Foundations of Statistical Algorithms [book review]

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , on February 28, 2014 by xi'an

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by modern statistical analysis. In short, it provides the necessary skills to construct statistical algorithms and hence to contribute to statistical computing. And I wish I had the luxury to teach from Foundations of Statistical Algorithms to my graduate students, if only we could afford an extra yearly course…

“Our aim is to enable the reader (…) to quickly understand the main ideas of modern numerical algorithms [rather] than having to memorize the current, and soon to be outdated, set of popular algorithms from computational statistics.”(p.1)

The book is built around the above aim, first presenting the reasons why computers can produce answers different from what we want, using least squares as a mean to check for (in)stability, then second establishing the ground forFishman Monte Carlo methods by discussing (pseudo-)random generation, including MCMC algorithms, before moving in third to bootstrap and resampling techniques, and  concluding with parallelisation and scalability. The text is highly structured, with frequent summaries, a division of chapters all the way down to sub-sub-sub-sections, an R implementation section in each chapter, and a few exercises. Continue reading

MCMSki IV [day 2.5]

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , on January 8, 2014 by xi'an

ridge4Despite a good rest during the ski break, my cold did not get away (no magic left in this world!) and I thus had a low attention span to attend the Bayesian statistics and Population genetics session: while Jukka Corander mentioned the improvement brought by our AMIS algorithm, I had difficulties getting the nature of the model, if only because he used a blackboard-like font that made math symbols too tiny to read. (Nice fonts, otherwise!), Daniel Lawson (of vomiting Warhammer fame!) talked about the alluring notion of a statistical emulator, and Barbara Engelhardt talked about variable selection in a SNP setting. I did not get a feeling on how handling ten millions of SNPs was possible in towards a variable selection goal.  My final session of the day was actually “my” invited session on ABC methods, where Richard Everitt presented a way of mixing exact approximation with ABC and synthetic likelihood (Wood, Nature) approximations. The resulting MAVIS algorithm is  not out yet. The second speaker was Ollie Ratman, who spoke on his accurate ABC that I have discussed many times here. And Jean-Michel Marin managed to drive from Montpelier, just in time to deliver his talk on our various explorations of the ABC model choice problem.

After a quick raclette at “home”, we headed back to the second poster session, where I had enough of a clear mind and not too much of a headache (!) to have several interesting discussions, incl. a new parallelisation suggested  by Ben Calderhead, the sticky Metropolis algorithm of Luca Martino, the airport management video of Jegar Pitchforth, the mixture of Dirichlet distributions for extremes by Anne Sabourin, not mentioning posters from Warwick or Paris. At the end of the evening  I walked back to my apartment with the Blossom skis we had brought in the morning to attract registrations for the ski race: not enough to make up for the amount charged by the ski school. Too bad, especially given Anto’s efforts to get this amazing sponsoring!

O’Bayes 2013 [#3]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , on December 23, 2013 by xi'an

IMG_2223A final day for this O’Bayes 2013 conference, where I missed the final session for travelling reasons. Several talks had highly attractive features (for me), from David Dunson’s on his recently arXived paper on parallel MCMC, that provides an alternative to the embarrassingly parallel algorithm I discussed a few weeks ago, to be discussed further in a future post, to Marty Wells hindered by poor weather and delivering by phone a talk on L1 shrinkage estimators (a bit of a paradox since, as discussed by Yuzo Maruyama, most MAP estimators cannot be minimax and, more broadly, since they cannot be expressed as resolutions of loss minimisation), to Malay Ghosh revisiting g-priors from an almost frequentist viewpoint,  to Gonzalo Garci-Donato presenting criteria for objective Bayesian model choice in a vision that was clearly the closest to my own perspective on the topic. Overall, when reflecting upon the diversity and high quality of the talks at this O’Bayes meeting, and also as the incoming chair-elect of the corresponding section of ISBA, I think that what emerges most significantly from those talks is an ongoing pondering on the nature of (objective Bayesian) testing, not only in the works extending the g-priors in various directions, but also in the whole debate between Bayes factors and information criteria, model averaging versus model selection. During the discussion on Gonzalo’s talk, David Draper objected to the search for an automated approach to the comparison of models, but I strongly lean towards Gonzalo’s perspective as we need to provide a reference solution able to tackle less formal and more realistic problems. I do hope to see more of those realistic problems tackled at O’Bayes 2015 (which location is not yet settled). In the meanwhile, a strong thank you! to the local organising committee and most specifically to Jim Berger!

Asymptotically Exact, Embarrassingly Parallel MCMC

Posted in Books, Statistics, University life with tags , , , , , , on November 26, 2013 by xi'an

Departamento de Matemática, Universidad Complutense de Madrid, 11/11/11Willie Neiswanger, Chong Wang, and Eric Xing (from CMU) recently arXived a paper entitled as above. The “embarrassing” in the title refers to the “embarrassingly simple” solution proposed therein, namely to solve the difficulty in handling very large datasets by running completely independent parallel MCMC samplers on parallel threads or computers and using the outcomes of those samplers as density estimates, pulled together as a product towards an approximation of the true posterior density. In other words, the idea is to break the posterior as

p(\theta|x) = \prod_{i=1}^m p_i(\theta|x)

and to use the estimate

\hat p(\theta|x) = \prod_{i=1}^m \hat p_i(\theta|x)

where the individual estimates are obtained by, say, non-parametric estimates. The method is then “asymptotically exact” in the weak (and unsurprising) sense of being converging in the number of MCMC iterations. Still, there is a theoretical justification that is not found in previous parallel methods that mixed all resulting samples without accounting for the subsampling. And I also appreciate the point that, in many cases, running MCMC samplers with subsamples produces faster convergence.

In the paper, the division of p into its components is done by partitioning the iid data into m subsets. And taking a power 1/m of the prior in each case. (Which may induce improperness issues.) However, the subdivision is arbitrary and can thus be implemented in other cases than the fairly restrictive iid setting. Because each (subsample)  non-parametric estimate involves T terms, the resulting overall estimate contains Tm terms and the authors suggest using an independent Metropolis-within-Gibbs sampler to handle this complexity. Which is necessary [took me a while to realise this!] for producing a final sample from the (approximate) true posterior distribution. As an aside, I wonder why the bandwidths are all equal across all subsamples, as they should depend on those. And as it would not make much of a difference. It would also be interesting to build a typology of cases where subsampling leads to subposteriors that are close to orthogonal, preventing the implementation of the method.

As it happened, I read this paper on the very day Nial Friel (University College Dublin) gave a seminar at the Big’MC seminar on the convergence of approximations to ergodic transition kernels, based on the recent results of Mitrophanov on the stability of Markov chains, where he introduced the topic by the case of datasets large enough to prevent the computation of the likelihood function.

Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on December 10, 2012 by xi'an

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting for TOMACS tonight in Berlin!) The issue is now ready for publication (next February unless I am confused!) and made of the following papers:

* Massive parallelization of serial inference algorithms for a complex generalized linear model
MARC A. SUCHARD, IVAN ZORYCH, PATRICK RYAN, DAVID MADIGAN
*Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm
SYLVAIN LE CORFF and GERSENDE FORT
* Efficient MCMC for Binomial Logit Models
AGNES FUSSL, SYLVIA FRÜHWIRTH-SCHNATTER, RUDOLF FRÜHWIRTH
* Adaptive Equi-Energy Sampler: Convergence and Illustration
AMANDINE SCHRECK and GERSENDE FORT and ERIC MOULINES
* Particle algorithms for optimization on binary spaces
CHRISTIAN SCHÄFER
* Posterior expectation of regularly paved random histograms
RAAZESH SAINUDIIN, GLORIA TENG, JENNIFER HARLOW, and DOMINIC LEE
* Small variance estimators for rare event probabilities
MICHEL BRONIATOWSKI and VIRGILE CARON
* Self-Avoiding Random Dynamics on Integer Complex Systems
FIRAS HAMZE, ZIYU WANG, and NANDO DE FREITAS
* Bayesian learning of noisy Markov decision processes
SUMEETPAL S. SINGH, NICOLAS CHOPIN, and NICK WHITELEY

Here is the draft of the editorial that will appear at the beginning of this special issue. (All faults are mine, of course!) Continue reading

multiple try/point Metropolis algorithm

Posted in Statistics, Travel with tags , , , , , on January 23, 2012 by xi'an

Among the arXiv documents I printed at the turn of the year in order to get a better look at them (in the métro if nowhere else!), there were two papers by Luca Martino and co-authors from Universidad Carlos III, Madrid, A multi-point Metropolis scheme with generic weight functions and Different acceptance functions for multiple try Metropolis schemes. The multiple-try algorithm sounds like another version of the delayed sampling algorithm of Tierney and Mira (1999) and Green and Mira (2001). I somehow missed it, even though it was introduced in Liu et al. (2000) and Quin and Liu (2001). Multiple-try Metropolis builds upon the idea that, instead of making one proposal at a time, it is feasible to build a sequence of proposals and to pick one among those, presumably rather likely and hence more open to being accepted. The sequence of proposals may depend upon the past propositions as well as on the current value, lending some degree of adaptability to the scheme. In the current implementation, the algorithm remains rather clumsy [in my opinion] in that (a) a fixed horizon N need be fixed and (b) an additional series of backward simulations need be produced simply to keep the balance equation happy… Hence a total number of simulations O(N) for one possible acceptance. The first note slightly extends Quin and Liu (2001) by using a fairly general weighting scheme. The second paper studies some particular choices for the weights in a much less adaptive scheme (where parallelisation would be an appropriate alternative, since each proposal in the multiple try only depends on the current value of the chain). But it does not demonstrate a more efficient behaviour than when using a cycle or a mixture of Metropolis-Hastings algorithms. The method seems to regain popularity, though, as Roberto Casarin, Radu Craiu and Fabrizio Leisen (also from Carlos III)  arXived a paper on a multiple-try algorithm, connected with population Monte Carlo, and more recently published it in Statistics and Computing.

Follow

Get every new post delivered to your Inbox.

Join 698 other followers