## Advances in scalable Bayesian computation [day #4]

Posted in Books, Mountains, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 7, 2014 by xi'an

Final day of our workshop Advances in Scalable Bayesian Computation already, since tomorrow morning is an open research time ½ day! Another “perfect day in paradise”, with the Banff Centre campus covered by a fine snow blanket, still falling…, and making work in an office of BIRS a dream-like moment.

Still looking for a daily theme, parallelisation could be the right candidate, even though other talks this week went into parallelisation issues, incl. Steve’s talk yesterday. Indeed, Anthony Lee gave a talk this morning on interactive sequential Monte Carlo, where he motivated the setting by a formal parallel structure. Then, Darren Wilkinson surveyed the parallelisation issues in Monte Carlo, MCMC, SMC and ABC settings, before arguing in favour of a functional language called Scala. (Neat entries to those topics can be found on Darren’s blog.) And in the afternoon session, Sylvia Frühwirth-Schnatter exposed her approach to the (embarrassingly) parallel problem, in the spirit of Steve’s , David Dunson’s and Scott’s (a paper posted on the day I arrived in Chamonix and hence I missed!). There was plenty to learn from that talk (do not miss the Yin-Yang moment at 25 mn!), but it also helped me to break a difficulty I had with the consensus Bayes representation for two weeks (more on that later!). And, even though Marc Suchard mostly talked about flu and trees in a very pleasant and broad talk, he also had a slide on parallelisation to fit the theme! Although unrelated with parallelism,  Nicolas Chopin’s talk was on sequential quasi-Monte Carlo algorithms: while I had heard previous versions of this talk in Chamonix and BigMC, I found it full of exciting stuff. And it clearly got the room truly puzzled by this possibility, in a positive way! Similarly, Alex Lenkoski spoke about extreme rain events in Norway with no trace of parallelism, but the general idea behind the examples was to question the notion of the calibrated Bayesian (with possible connections with the cut models).

This has been a wonderful week and I am sure the participants got as much as I did from the talks and the informal exchanges. Thanks to BIRS for the sponsorship and the superb organisation of the week (and to the Banff Centre for providing such a paradisical environment). I feel very privileged to have benefited from this support, even though I deadly hope to be back in Banff within a few years.

## Foundations of Statistical Algorithms [book review]

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , on February 28, 2014 by xi'an

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by modern statistical analysis. In short, it provides the necessary skills to construct statistical algorithms and hence to contribute to statistical computing. And I wish I had the luxury to teach from Foundations of Statistical Algorithms to my graduate students, if only we could afford an extra yearly course…

“Our aim is to enable the reader (…) to quickly understand the main ideas of modern numerical algorithms [rather] than having to memorize the current, and soon to be outdated, set of popular algorithms from computational statistics.”(p.1)

The book is built around the above aim, first presenting the reasons why computers can produce answers different from what we want, using least squares as a mean to check for (in)stability, then second establishing the ground forFishman Monte Carlo methods by discussing (pseudo-)random generation, including MCMC algorithms, before moving in third to bootstrap and resampling techniques, and  concluding with parallelisation and scalability. The text is highly structured, with frequent summaries, a division of chapters all the way down to sub-sub-sub-sections, an R implementation section in each chapter, and a few exercises. Continue reading

## MCMSki IV [day 2.5]

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , on January 8, 2014 by xi'an

Despite a good rest during the ski break, my cold did not get away (no magic left in this world!) and I thus had a low attention span to attend the Bayesian statistics and Population genetics session: while Jukka Corander mentioned the improvement brought by our AMIS algorithm, I had difficulties getting the nature of the model, if only because he used a blackboard-like font that made math symbols too tiny to read. (Nice fonts, otherwise!), Daniel Lawson (of vomiting Warhammer fame!) talked about the alluring notion of a statistical emulator, and Barbara Engelhardt talked about variable selection in a SNP setting. I did not get a feeling on how handling ten millions of SNPs was possible in towards a variable selection goal.  My final session of the day was actually “my” invited session on ABC methods, where Richard Everitt presented a way of mixing exact approximation with ABC and synthetic likelihood (Wood, Nature) approximations. The resulting MAVIS algorithm is  not out yet. The second speaker was Ollie Ratman, who spoke on his accurate ABC that I have discussed many times here. And Jean-Michel Marin managed to drive from Montpelier, just in time to deliver his talk on our various explorations of the ABC model choice problem.

After a quick raclette at “home”, we headed back to the second poster session, where I had enough of a clear mind and not too much of a headache (!) to have several interesting discussions, incl. a new parallelisation suggested  by Ben Calderhead, the sticky Metropolis algorithm of Luca Martino, the airport management video of Jegar Pitchforth, the mixture of Dirichlet distributions for extremes by Anne Sabourin, not mentioning posters from Warwick or Paris. At the end of the evening  I walked back to my apartment with the Blossom skis we had brought in the morning to attract registrations for the ski race: not enough to make up for the amount charged by the ski school. Too bad, especially given Anto’s efforts to get this amazing sponsoring!

## O’Bayes 2013 [#3]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , on December 23, 2013 by xi'an

A final day for this O’Bayes 2013 conference, where I missed the final session for travelling reasons. Several talks had highly attractive features (for me), from David Dunson’s on his recently arXived paper on parallel MCMC, that provides an alternative to the embarrassingly parallel algorithm I discussed a few weeks ago, to be discussed further in a future post, to Marty Wells hindered by poor weather and delivering by phone a talk on L1 shrinkage estimators (a bit of a paradox since, as discussed by Yuzo Maruyama, most MAP estimators cannot be minimax and, more broadly, since they cannot be expressed as resolutions of loss minimisation), to Malay Ghosh revisiting g-priors from an almost frequentist viewpoint,  to Gonzalo Garci-Donato presenting criteria for objective Bayesian model choice in a vision that was clearly the closest to my own perspective on the topic. Overall, when reflecting upon the diversity and high quality of the talks at this O’Bayes meeting, and also as the incoming chair-elect of the corresponding section of ISBA, I think that what emerges most significantly from those talks is an ongoing pondering on the nature of (objective Bayesian) testing, not only in the works extending the g-priors in various directions, but also in the whole debate between Bayes factors and information criteria, model averaging versus model selection. During the discussion on Gonzalo’s talk, David Draper objected to the search for an automated approach to the comparison of models, but I strongly lean towards Gonzalo’s perspective as we need to provide a reference solution able to tackle less formal and more realistic problems. I do hope to see more of those realistic problems tackled at O’Bayes 2015 (which location is not yet settled). In the meanwhile, a strong thank you! to the local organising committee and most specifically to Jim Berger!

## Asymptotically Exact, Embarrassingly Parallel MCMC

Posted in Books, Statistics, University life with tags , , , , , , on November 26, 2013 by xi'an

Willie Neiswanger, Chong Wang, and Eric Xing (from CMU) recently arXived a paper entitled as above. The “embarrassing” in the title refers to the “embarrassingly simple” solution proposed therein, namely to solve the difficulty in handling very large datasets by running completely independent parallel MCMC samplers on parallel threads or computers and using the outcomes of those samplers as density estimates, pulled together as a product towards an approximation of the true posterior density. In other words, the idea is to break the posterior as

$p(\theta|x) = \prod_{i=1}^m p_i(\theta|x)$

and to use the estimate

$\hat p(\theta|x) = \prod_{i=1}^m \hat p_i(\theta|x)$

where the individual estimates are obtained by, say, non-parametric estimates. The method is then “asymptotically exact” in the weak (and unsurprising) sense of being converging in the number of MCMC iterations. Still, there is a theoretical justification that is not found in previous parallel methods that mixed all resulting samples without accounting for the subsampling. And I also appreciate the point that, in many cases, running MCMC samplers with subsamples produces faster convergence.

In the paper, the division of p into its components is done by partitioning the iid data into m subsets. And taking a power 1/m of the prior in each case. (Which may induce improperness issues.) However, the subdivision is arbitrary and can thus be implemented in other cases than the fairly restrictive iid setting. Because each (subsample)  non-parametric estimate involves T terms, the resulting overall estimate contains Tm terms and the authors suggest using an independent Metropolis-within-Gibbs sampler to handle this complexity. Which is necessary [took me a while to realise this!] for producing a final sample from the (approximate) true posterior distribution. As an aside, I wonder why the bandwidths are all equal across all subsamples, as they should depend on those. And as it would not make much of a difference. It would also be interesting to build a typology of cases where subsampling leads to subposteriors that are close to orthogonal, preventing the implementation of the method.

As it happened, I read this paper on the very day Nial Friel (University College Dublin) gave a seminar at the Big’MC seminar on the convergence of approximations to ergodic transition kernels, based on the recent results of Mitrophanov on the stability of Markov chains, where he introduced the topic by the case of datasets large enough to prevent the computation of the likelihood function.