Archive for Chamonix-Mont-Blanc
Scott Schmidler and his Ph.D. student Douglas VanDerwerken have arXived a paper on parallel MCMC the very day I left for Chamonix, prior to MCMSki IV, so it is no wonder I missed it at the time. This work is somewhat in the spirit of the parallel papers Scott et al.’s consensus Bayes, Neiswanger et al.’s embarrassingly parallel MCMC, Wang and Dunson’s Weierstrassed MCMC (and even White et al.’s parallel ABC), namely that the computation of the likelihood can be broken into batches and MCMC run over those batches independently. In their short survey of previous works on parallelization, VanDerwerken and Schmidler overlooked our neat (!) JCGS Rao-Blackwellisation with Pierre Jacob and Murray Smith, maybe because it sounds more like post-processing than genuine parallelization (in that it does not speed up the convergence of the chain but rather improves the Monte Carlo usages one can make of this chain), maybe because they did not know of it.
“This approach has two shortcomings: first, it requires a number of independent simulations, and thus processors, equal to the size of the partition; this may grow exponentially in dim(Θ). Second, the rejection often needed for the restriction doesn’t permit easy evaluation of transition kernel densities, required below. In addition, estimating the relative weights wi with which they should be combined requires care.” (p.3)
The idea of the authors is to replace an exploration of the whole space operated via a single Markov chain (or by parallel chains acting independently which all have to “converge”) with parallel and independent explorations of parts of the space by separate Markov chains. “Small is beautiful”: it takes a shorter while to explore each set of the partition, hence to converge, and, more importantly, each chain can work in parallel to the others. More specifically, given a partition of the space, into sets Ai with posterior weights wi, parallel chains are associated with targets equal to the original target restricted to those Ai‘s. This is therefore an MCMC version of partitioned sampling. With regard to the shortcomings listed in the quote above, the authors consider that there does not need to be a bijection between the partition sets and the chains, in that a chain can move across partitions and thus contribute to several integral evaluations simultaneously. I am a bit worried about this argument since it amounts to getting a random number of simulations within each partition set Ai. In my (maybe biased) perception of partitioned sampling, this sounds somewhat counter-productive, as it increases the variance of the overall estimator. (Of course, not restricting a chain to a given partition set Ai has the incentive of avoiding a possibly massive amount of rejection steps. It is however unclear (a) whether or not it impacts ergodicity (it all depends on the way the chain is constructed, i.e. against which target(s)…) as it could lead to an over-representation of some boundaries and (b) whether or not it improves the overall convergence properties of the chain(s).)
“The approach presented here represents a solution to this problem which can completely remove the waiting times for crossing between modes, leaving only the relatively short within-mode equilibration times.” (p.4)
A more delicate issue with the partitioned MCMC approach (in my opinion!) stands with the partitioning. Indeed, in a complex and high-dimension model, the construction of the appropriate partition is a challenge in itself as we often have no prior idea where the modal areas are. Waiting for a correct exploration of the modes is indeed faster than waiting for crossing between modes, provided all modes are represented and the chain for each partition set Ai has enough energy to explore this set. It actually sounds (slightly?) unlikely that a target with huge gaps between modes will see a considerable improvement from the partioned version when the partition sets Ai are selected on the go, because some of the boundaries between the partition sets may be hard to reach with a off-the-shelf proposal. (Obviously, the second part of the method on the adaptive construction of partitions is yet in the writing and I am looking forward its aXival!)
Furthermore, as noted by Pierre Jacob (of Statisfaction fame!), the adaptive construction of the partition has a lot in common with Wang-Landau schemes. Which goal is to produce a flat histogram proposal from the current exploration of the state space. Connections with Atchadé’s and Liu’s (2010, Statistical Sinica) extension of the original Wang-Landau algorithm could have been spelled out. Esp. as the Voronoï tessellation construct seems quite innovative in this respect.
[Here is a call from the BayesComp Board for proposals for MCMSki 5, renamed as below to fit the BayesComp section. The earlier poll on the 'Og helped shape the proposal, with the year, 2016 vs. 2017, remaining open. I just added town to resort below as it did not sound from the poll people were terribly interested in resorts.]
The Bayesian Computation Section of ISBA is soliciting proposals to host its flagship conference:
Bayesian Computing at MCMSki
The expectation is that the meeting will be held in January 2016, but the committee will consider proposals for other times through January 2017.
This meeting will be the next incarnation of the popular MCMSki series that addresses recent advances in the theory and application of Bayesian computational methods such as MCMC, all in the context of a world-class ski resort/town. While past meetings have taken place in the Alps and the Rocky Mountains, we encourage applications from any venue that could support MCMSki. A three-day meeting is planned, perhaps with an additional day or two of satellite meetings and/or short courses.
One page proposals should address feasibility of hosting the meeting including
1. Proposed dates.
2. Transportation for international participants (both the proximity of international airports and transportation to/from the venue).
3. The conference facilities.
4. The availability and cost of hotels, including low cost options.
5. The proposed local organizing committee and their collective experience organizing international meetings.
6. Expected or promised contributions from the host organization, host country, or industrial partners towards the cost of running the meetings.
Proposals should be submitted to David van Dyk (dvandyk, BayesComp Program Chair) at imperial.ac.uk no later than May 31, 2014.
The Board of Bayesian Computing Section will evaluate the proposals, choose a venue, and appoint the Program Committee for Bayesian Computing at MCMSki.
Following the exciting and innovative talks, posters and discussions at MCMski IV, the editor of Statistics and Computing, Mark Girolami (who also happens to be the new president-elect of the BayesComp section of ISBA, which is taking over the management of future MCMski meetings), kindly proposed to publish a special issue of the journal open to all participants to the meeting. Not only to speakers, mind, but to all participants.
So if you are interested in submitting a paper to this special issue of a computational statistics journal that is very close to our MCMski themes, I encourage you to do so. (Especially if you missed the COLT 2014 deadline!) The deadline for submissions is set on March 15 (a wee bit tight but we would dearly like to publish the issue in 2014, namely the same year as the meeting.) Submissions are to be made through the Statistics and Computing portal, with a mention that they are intended for the special issue.
An editorial committee chaired by Antonietta Mira and composed of Christophe Andrieu, Brad Carlin, Nicolas Chopin, Jukka Corander, Colin Fox, Nial Friel, Chris Holmes, Gareth Jones, Peter Müller, Antonietta Mira, Geoff Nicholls, Gareth Roberts, Håvård Rue, Robin Ryder, and myself, will examine the submissions and get back within a few weeks to the authors. In a spirit similar to the JRSS Read Paper procedure, submissions will first be examined collectively, before being sent to referees. We plan to publish the reviews as well, in order to include a global set of comments on the accepted papers. We intend to do it in The Economist style, i.e. as a set of edited anonymous comments. Usual instructions for Statistics and Computing apply, with the additional requirements that the paper should be around 10 pages and include at least one author who took part in MCMski IV.
Along other members of BayesComp who launched a brainstorming session for the next MCMSki meeting before the snow has completely melted from our skis, we discussed the following topics about the future meeting:
1. Should we keep the brandname MCMSki for the incoming meetings? The argument for changing the name is that the community is broader than MCMC, as already shown by the program of MCMCSki 4. I have no strong feeling about this name, even though I find it catchy and sexy! I would thus rather keep MCMSki because it is already a brandname. Else, we could switch to M(CM)Ski, MCMSki with friends (and foes?), Snowtistics and Compuskis, or to any other short name with or without ski in it, as long as the filiation from the previous meetings is clear in the mind of the participants.
2. Should we move the frequency to two years? While the current meeting was highly popular and attracted the record number of 223 participants, and while the period right after the Winter break is not so heavily packed with meetings, we were several at a banquet table last week to object to a planned move from three to two years. I understand the appeal of meetings with great speakers in a terrific mountainous taking place as often as possible… However what stroke me with the meeting last week is that, despite the large number of parallel sessions, I overwhelmingly heard novel stuff, compared with previous meetings. And would have heard even more, had I been gifted with ubiquity. Moving to two years could cull this feeling. And induce “meeting fatigue. Furthermore, I fear that the increase in ISBA sections and the natural increase of meeting entropy pushes the percentage of meetings one can attend down and down. Sticking to a three year period would keep MCMSki more significantly attractive in that refusing an invitation would mean postponing for three years, &tc. So I personally oppose a move to two years.
3. Should we seek further financial support? The financial support behind a conference is obviously crucial. When planning MCMski 4, I however decided against contacting companies as I have no skills in the matter, but finding ways to support conference rooms, youngster travels, ski race, poster prizes and banquet would be more-than-nice. Anto’s initiative to bring a pair of skis offered by a ski company was a great one and one feat that I hope can be duplicated in the future. (During my spare week in Chamonix, I contacted ski rentals and the skipass company for a rebate, to no avail.) Travel support from ISBA and SBSS towards the travel costs of around 20 young researchers was much appreciated but is not necessarily to be found at each occurrence… Note that, despite the lack of corporate support, MCMski 4 is going to provide a positive financial return to ISBA (and BayesComp) and I strongly suggest we keep a tradition of minimalist services for the future meetings in order to fight outrageous conference fees. I think the fees should cover the conference rooms and possibly a cuppa or two a day but nothing more. In particular, the banquet should remain optional. And so should any other paying social event. (We can also do without goodies and conference material.)
4. Where should the next meeting take place? The call is on for potential organisers in either 2016 or 2017, early January. Between the Alps and the Rockies, there are plenty of possible locations, but more exotic places in the Northern Hemisphere could be suggested as well, from Lapland to Hokkaido… A question raised by Christophe Andrieu that I’d like to second is whether the preference should go to places that qualify as villages or as resort. Bormio and Chamonix are villages, while Park City is not. (I definitely prefer villages!)
Richard Wilkinson arXived a paper on accelerated ABC during MCMSki 4, paper that I almost missed when quickly perusing the daily list. This is another illustration of the “invasion of Gaussian processes” in ABC settings. Maybe under the influence of machine learning.
The paper starts with a link to the synthetic likelihood approximation of Wood (2010, Nature), as in Richard Everitt’s talk last week. Richard (W.) presents the generalised ABC as a kernel-based acceptance probability, using a kernel π(y|x), when y is the observed data and x=x(θ) the simulated one. He proposes a Gaussian process modelling for the log-likelihood (at the observed data y), with a quadratic (in θ) mean and Matérn covariance matrix. Hence the connection with Wood’s synthetic likelihood. Another connection is with Nicolas’ talk on QMC(MC): the θ’s are chosen following a Sobol sequence “in order to minimize the number of design points”. Which requires a reparameterisation to [0,1]p… I find this “uniform” exploration of the whole parameter space delicate to envision in complex parameter spaces and realistic problems, since the likelihood is highly concentrated on a tiny subregion of the original [0,1]p. Not mentioning the issue of the spurious mass on the boundaries of the hypercube possibly induced by the change of variable. The sequential algorithm of Richard also attempts at eliminating implausible zones of the parameter space. i.e. zones where the likelihood is essentially zero. My worries with this interesting notion are that (a) the early Gaussian process approximations may be poor and hence exclude zones they should not; (b) all Gaussian process approximations at all iterations must be saved; (c) the Sobol sequences apply to the whole [0,1]p at each iteration but the non-implausible region shrinks at each iteration, which induces a growing inefficiency in the algorithm. The Sobol sequence should be restricted to the previous non-implausible zone.
Overall, an interesting proposal that would need more prodding to understand whether or not it is robust to poor initialisation and complex structures. And a proposal belonging to the estimated likelihood branch of ABC, which makes use of the final Gaussian process approximation to run an MCM algorithm. Without returning to pseudo-data simulation, replacing it with log-likelihood simulation.
“These algorithms sample space randomly and naively and do not learn from previous simulations”
The above criticism is moderated in a footnote about ABC-SMC using the “current parameter value to determine which move to make next [but] parameters visited in previous iterations are not taken into account”. I still find it excessive in that SMC algorithms and in particular ABC-SMC algorithms are completely free to use the whole past to build the new proposal. This was clearly enunciated in our earlier population Monte Carlo papers. For instance, the complete collection of past particles can be recycled by weights computing thru our AMIS algorithm, as illustrated by Jukka Corander in one genetics application.