Archive for Voronoi tesselation

lords of the rings

Posted in Books, pictures, Statistics, University life with tags , , , , , , on February 9, 2017 by xi'an

In the 19 Jan 2017 issue of Nature [that I received two weeks later], a paper by Tarnita et al discusses regular vegetation patterns like fairy patterns. While this would seem like an ideal setting for point process modelling, the article does not seem to get into that direction, debating instead between ecological models. Which combines vegetal self-organisation, with subterranean insect competition. Since the paper seems to derive validation of a model by simulation means without producing a single equation, I went and checked the supplementary material attached to this paper. What I gathered from this material is that the system of differential equations used to build this model seems to be extrapolated by seeking parameter values consistent with what is known” rather than estimated as in a statistical model. Given the extreme complexity of the resulting five page model, I am surprised at the low level of validation of the construct, with no visible proof of stationarity of the (stochastic) model thus constructed, and no model assessment in a statistical sense. Of course, a major disclaimer applies: (a) this area does not even border my domains of (relative) expertise and (b) I have not spent much time perusing over the published paper and the attached supplementary material. (Note: This issue of Nature also contains a fascinating review paper by Nielsen et al. on a detailed scenario of human evolutionary history, based on the sequencing of genomes of extinct hominids.)

[more] parallel MCMC

Posted in Books, Mountains with tags , , , , , , , , , , on April 3, 2014 by xi'an

Scott Schmidler and his Ph.D. student Douglas VanDerwerken have arXived a paper on parallel MCMC the very day I left for Chamonix, prior to MCMSki IV, so it is no wonder I missed it at the time. This work is somewhat in the spirit of the parallel papers Scott et al.’s consensus Bayes,  Neiswanger et al.’s embarrassingly parallel MCMC, Wang and Dunson’s Weierstrassed MCMC (and even White et al.’s parallel ABC), namely that the computation of the likelihood can be broken into batches and MCMC run over those batches independently. In their short survey of previous works on parallelization, VanDerwerken and Schmidler overlooked our neat (!) JCGS Rao-Blackwellisation with Pierre Jacob and Murray Smith, maybe because it sounds more like post-processing than genuine parallelization (in that it does not speed up the convergence of the chain but rather improves the Monte Carlo usages one can make of this chain), maybe because they did not know of it.

“This approach has two shortcomings: first, it requires a number of independent simulations, and thus processors, equal to the size of the partition; this may grow exponentially in dim(Θ). Second, the rejection often needed for the restriction doesn’t permit easy evaluation of transition kernel densities, required below. In addition, estimating the relative weights wi with which they should be combined requires care.” (p.3)

The idea of the authors is to replace an exploration of the whole space operated via a single Markov chain (or by parallel chains acting independently which all have to “converge”) with parallel and independent explorations of parts of the space by separate Markov chains. “Small is beautiful”: it takes a shorter while to explore each set of the partition, hence to converge, and, more importantly, each chain can work in parallel to the others. More specifically, given a partition of the space, into sets Ai with posterior weights wi, parallel chains are associated with targets equal to the original target restricted to those Ai‘s. This is therefore an MCMC version of partitioned sampling. With regard to the shortcomings listed in the quote above, the authors consider that there does not need to be a bijection between the partition sets and the chains, in that a chain can move across partitions and thus contribute to several integral evaluations simultaneously. I am a bit worried about this argument since it amounts to getting a random number of simulations within each partition set Ai. In my (maybe biased) perception of partitioned sampling, this sounds somewhat counter-productive, as it increases the variance of the overall estimator. (Of course, not restricting a chain to a given partition set Ai has the incentive of avoiding a possibly massive amount of rejection steps. It is however unclear (a) whether or not it impacts ergodicity (it all depends on the way the chain is constructed, i.e. against which target(s)…) as it could lead to an over-representation of some boundaries and (b) whether or not it improves the overall convergence properties of the chain(s).)

“The approach presented here represents a solution to this problem which can completely remove the waiting times for crossing between modes, leaving only the relatively short within-mode equilibration times.” (p.4)

A more delicate issue with the partitioned MCMC approach (in my opinion!) stands with the partitioning. Indeed, in a complex and high-dimension model, the construction of the appropriate partition is a challenge in itself as we often have no prior idea where the modal areas are. Waiting for a correct exploration of the modes is indeed faster than waiting for crossing between modes, provided all modes are represented and the chain for each partition set Ai has enough energy to explore this set. It actually sounds (slightly?) unlikely that a target with huge gaps between modes will see a considerable improvement from the partioned version when the partition sets Ai are selected on the go, because some of the boundaries between the partition sets may be hard to reach with a off-the-shelf proposal. (Obviously, the second part of the method on the adaptive construction of partitions is yet in the writing and I am looking forward its aXival!)

Furthermore, as noted by Pierre Jacob (of Statisfaction fame!), the adaptive construction of the partition has a lot in common with Wang-Landau schemes. Which goal is to produce a flat histogram proposal from the current exploration of the state space. Connections with Atchadé’s and Liu’s (2010, Statistical Sinica) extension of the original Wang-Landau algorithm could have been spelled out. Esp. as the Voronoï tessellation construct seems quite innovative in this respect.

Quadrature methods for evidence approximation

Posted in Statistics with tags , , , , , on November 13, 2009 by xi'an

Two papers written by astronomers have been recently posted on arXiv about (new) ways to approximate evidence. Since they both perceive those approximations as some advanced form of quadrature, they are close enough that a comparison makes sense.

The paper by Rutger van Haasteren uses a Voronoi tessellation to represent the evidence as

\int f(x) dx \approx \sum f(x_i) O_i

when the x_i‘s are simulated from the normalised version of f and the O_i‘s are the associated Voronoi cells. This approximation converges (even when the x_i‘s are not simulated from the right distribution) but it cannot be used in practice because of the cost of the Voronoi tessellation. Instead, Rutger van Haasteren suggests using a sort of an approximate HPD region F_t and its volume, V_t, along with an harmonic mean within the HPD region:

\int f(x) dx \approx V_t N \big/ \sum_{x_i \in F_t} 1/f(x_i)

where N is the total number of simulations. So in the end this solution is actually the one proposed in our paper with Darren Wraith, as described in this earlier post! It is thus nice to see an application of this idea in a realistic situation, with performances that compare with nested sampling in its MultiNest version of Feroz, Hobson and Bridges. (This is especially valuable when considering that nested sampling is often presented as the only solution to approximating evidence.)

The second paper by Martin Weinberg also adopt a quadrature perspective, while integrating in the Lebesgue sense rather than in the Riemann sense. This perspective applies to nested sampling even though John Skilling does not justify nested sampling that way but  Martin Weinberg also shows that the (infamous) harmonic mean estimator also is a Lebesgue-quadrature approximation. The solution proposed in the paper is a different kind of truncation on the functional values, that relates more to nested sampling and on which I hope to report more thoroughly later.