Today, I am leaving Paris for a 8 day stay in Iceland! This is quite exciting, for many reasons: first, I missed the AISTATS 2013 last year as I was still in the hospital; second, I am giving a short short tutorial on ABC methods which will be more like a long (two hours) talk; third, it gives me the fantastic opportunity to visit Iceland for a few days, a place that was top of my wish list of countries to visit. The weather forecast is rather bleak but I am carrying enough waterproof layers to withstand a wee bit of snow and rain… The conference proper starts next Tuesday, April 22, with the tutorials taking place next Friday, April 25. Hence leaving me three completely free days for exploring the area near Reykjavik.
Archive for the Mountains Category
Daniel Simpson gave a seminar at CREST yesterday on his recently arXived paper, “Penalising model component complexity: A principled, practical approach to constructing priors” written with Thiago Martins, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. Paper that he should also have given in Banff last month had he not lost his passport in København airport… I have already commented at length on this exciting paper, hopefully to become a discussion paper in a top journal!, so I am just pointing out two things that came to my mind during the energetic talk delivered by Dan to our group. The first thing is that those penalised complexity (PC) priors of theirs rely on some choices in the ordering of the relevance, complexity, nuisance level, &tc. of the parameters, just like reference priors. While Dan already wrote a paper on Russian roulette, there is also a Russian doll principle at work behind (or within) PC priors. Each shell of the Russian doll corresponds to a further level of complexity whose order need be decided by the modeller… Not very realistic in a hierarchical model with several types of parameters having only local meaning.
My second point is that the construction of those “politically correct” (PC) priors reflects another Russian doll structure, namely one of embedded models, hence would and should lead to a natural multiple testing methodology. Except that Dan rejected this notion during his talk, by being opposed to testing per se. (A good topic for one of my summer projects, if nothing more, then!)
It has been exactly a year since my climbing accident and the loss of my right thumb. Time for a quick recap (for anyone still interested!): Looking back over that thumbless year, I cannot see a significant impact over my daily life: I can essentially operate the same way as before, from climbing to cooking, from writing to biking, from driving to skiing, and the few inconveniences I experience are quite minor. Not large enough to rely on the prosthesis I received a few months ago. I do not particularly suffer from the right hand in cold (or hot) weather and my ice-climbing trip in Banff last month showed I could stand temperatures of -30⁰C with no difference from the past. I never experience “phantom thumb” sensations and rarely notice people taking stock of my missing thumb. Hence, while it has been an annoying accident that has disrupted our lives for a few weeks, the long term consequences are clearly minimal.
It all started in Jim Berger’s basement, drinking with the uttermost reverence an otherworldly Turley Zinfandel during the great party Ann and Jim Berger hosted for the O’Bayes’13 workshop in Duke. I then mentioned to Angela Bitto and Alexandra Posekany, from WU Wien, that I was going to be in Austria next September for a seminar in Linz, at the Johannes Kepler Universität, and, as it happened to take place the day before BAYSM ’14, the second conference of the young Bayesian statisticians, in connection with the j-ISBA section, they most kindly invited me to the meeting! As a senior Bayesian, most obviously! This is quite exciting, all the more because I never visited Vienna before. (Contrary to other parts of Austria, like the Großglockner, where I briefly met Peter Habeler. Trivia: the cover picture of the ‘Og is actually taken from the Großglockner.)
Scott Schmidler and his Ph.D. student Douglas VanDerwerken have arXived a paper on parallel MCMC the very day I left for Chamonix, prior to MCMSki IV, so it is no wonder I missed it at the time. This work is somewhat in the spirit of the parallel papers Scott et al.’s consensus Bayes, Neiswanger et al.’s embarrassingly parallel MCMC, Wang and Dunson’s Weierstrassed MCMC (and even White et al.’s parallel ABC), namely that the computation of the likelihood can be broken into batches and MCMC run over those batches independently. In their short survey of previous works on parallelization, VanDerwerken and Schmidler overlooked our neat (!) JCGS Rao-Blackwellisation with Pierre Jacob and Murray Smith, maybe because it sounds more like post-processing than genuine parallelization (in that it does not speed up the convergence of the chain but rather improves the Monte Carlo usages one can make of this chain), maybe because they did not know of it.
“This approach has two shortcomings: first, it requires a number of independent simulations, and thus processors, equal to the size of the partition; this may grow exponentially in dim(Θ). Second, the rejection often needed for the restriction doesn’t permit easy evaluation of transition kernel densities, required below. In addition, estimating the relative weights wi with which they should be combined requires care.” (p.3)
The idea of the authors is to replace an exploration of the whole space operated via a single Markov chain (or by parallel chains acting independently which all have to “converge”) with parallel and independent explorations of parts of the space by separate Markov chains. “Small is beautiful”: it takes a shorter while to explore each set of the partition, hence to converge, and, more importantly, each chain can work in parallel to the others. More specifically, given a partition of the space, into sets Ai with posterior weights wi, parallel chains are associated with targets equal to the original target restricted to those Ai‘s. This is therefore an MCMC version of partitioned sampling. With regard to the shortcomings listed in the quote above, the authors consider that there does not need to be a bijection between the partition sets and the chains, in that a chain can move across partitions and thus contribute to several integral evaluations simultaneously. I am a bit worried about this argument since it amounts to getting a random number of simulations within each partition set Ai. In my (maybe biased) perception of partitioned sampling, this sounds somewhat counter-productive, as it increases the variance of the overall estimator. (Of course, not restricting a chain to a given partition set Ai has the incentive of avoiding a possibly massive amount of rejection steps. It is however unclear (a) whether or not it impacts ergodicity (it all depends on the way the chain is constructed, i.e. against which target(s)…) as it could lead to an over-representation of some boundaries and (b) whether or not it improves the overall convergence properties of the chain(s).)
“The approach presented here represents a solution to this problem which can completely remove the waiting times for crossing between modes, leaving only the relatively short within-mode equilibration times.” (p.4)
A more delicate issue with the partitioned MCMC approach (in my opinion!) stands with the partitioning. Indeed, in a complex and high-dimension model, the construction of the appropriate partition is a challenge in itself as we often have no prior idea where the modal areas are. Waiting for a correct exploration of the modes is indeed faster than waiting for crossing between modes, provided all modes are represented and the chain for each partition set Ai has enough energy to explore this set. It actually sounds (slightly?) unlikely that a target with huge gaps between modes will see a considerable improvement from the partioned version when the partition sets Ai are selected on the go, because some of the boundaries between the partition sets may be hard to reach with a off-the-shelf proposal. (Obviously, the second part of the method on the adaptive construction of partitions is yet in the writing and I am looking forward its aXival!)
Furthermore, as noted by Pierre Jacob (of Statisfaction fame!), the adaptive construction of the partition has a lot in common with Wang-Landau schemes. Which goal is to produce a flat histogram proposal from the current exploration of the state space. Connections with Atchadé’s and Liu’s (2010, Statistical Sinica) extension of the original Wang-Landau algorithm could have been spelled out. Esp. as the Voronoï tessellation construct seems quite innovative in this respect.