Archive for Voronoi tesselation

ABC for COVID spread reconstruction

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , on December 27, 2021 by xi'an

A recent Nature paper by Jessica Davis et al. (with an assessment by Simon Cauchemez and X from INSERM) reassessed the appearance of COVID in European and American States. Accounting for the massive under-reporting in the early days since there was no testing. The approach is based on a complex dynamic model whose parameters are estimated by an ABC algorithm (the reference being the PLoS article that initiated the ABC Wikipedia page). Results are quite interesting in that the distribution of the entry dates covers a calendar as early as December 2019 in most cases. And a proportion of missed cases as high as 99%.

“As evidence, E, we considered the cumulative number of SARS-CoV-2 cases internationally imported from China up to January 21, 2020″

The model behind remain a classical SLIR model but with a discrete and stochastic dynamical and a geographical compartmentalization based on a Voronoi tessellation centred at airports, commuting intensity and population density. Interventions by local and State authorities are also accounted for. The ABC version is a standard rejection algorithm with distance based on the evidence as quoted above. Which is a form of cdf distance (as in our Wasserstein ABC paper). For the posterior distribution of the IFR,  a second ABC algorithm uses the relative distance between observed and generated deaths (per country). The paper further investigates different introduction sources (countries) before local transmission was established. For instance, China is shown to be the dominant source for the first EU countries impacted by the pandemics such as Italy, UK, Germany, France and Spain. Using a “counterfactual scenario where the surveillance systems of the US states and European countries are imagined to operate at levels able to identify 50% of all imported and locally generated infections”, the authors conclude that

“broadening testing specifications could have considerably slowed the pandemic progression, buying considerable time to prepare mitigation responses.”

Statistics at Bristol [& U]

Posted in pictures, Statistics, Travel, University life with tags , , , , , on September 14, 2021 by xi'an

For the celebration of the recently renovated Fry Building, which I visited in Feb 2019 (my last time in Britain!), the University of Bristol is holding the Fry Conference series, with one dedicated to statistics on 16-17 September 2021. With Peter Green, Arnaud Doucet, and Judith Rousseau among the speakers. It is sadly on-line so does not give one the opportunity to admire the renovated bulding. And the Voronoi sculpture! (And You figures in the title of the conference.)

The Fry Building [Bristol maths]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on March 7, 2020 by xi'an

While I had heard of Bristol maths moving to the Fry Building for most of the years I visited the department, starting circa 1999, this last trip to Bristol was the opportunity for a first glimpse of the renovated building which has been done beautifully, making it the most amazing maths department I have ever visited.  It is incredibly spacious and luminous (even in one of these rare rainy days when I visited), while certainly contributing to the cohesion and interactions of the whole department. And the choice of the Voronoi structure should not have come as a complete surprise (to me), given Peter Green’s famous contribution to their construction!

lords of the rings

Posted in Books, pictures, Statistics, University life with tags , , , , , , on February 9, 2017 by xi'an

In the 19 Jan 2017 issue of Nature [that I received two weeks later], a paper by Tarnita et al discusses regular vegetation patterns like fairy patterns. While this would seem like an ideal setting for point process modelling, the article does not seem to get into that direction, debating instead between ecological models. Which combines vegetal self-organisation, with subterranean insect competition. Since the paper seems to derive validation of a model by simulation means without producing a single equation, I went and checked the supplementary material attached to this paper. What I gathered from this material is that the system of differential equations used to build this model seems to be extrapolated by seeking parameter values consistent with what is known” rather than estimated as in a statistical model. Given the extreme complexity of the resulting five page model, I am surprised at the low level of validation of the construct, with no visible proof of stationarity of the (stochastic) model thus constructed, and no model assessment in a statistical sense. Of course, a major disclaimer applies: (a) this area does not even border my domains of (relative) expertise and (b) I have not spent much time perusing over the published paper and the attached supplementary material. (Note: This issue of Nature also contains a fascinating review paper by Nielsen et al. on a detailed scenario of human evolutionary history, based on the sequencing of genomes of extinct hominids.)

[more] parallel MCMC

Posted in Books, Mountains with tags , , , , , , , , , , on April 3, 2014 by xi'an

Scott Schmidler and his Ph.D. student Douglas VanDerwerken have arXived a paper on parallel MCMC the very day I left for Chamonix, prior to MCMSki IV, so it is no wonder I missed it at the time. This work is somewhat in the spirit of the parallel papers Scott et al.’s consensus Bayes,  Neiswanger et al.’s embarrassingly parallel MCMC, Wang and Dunson’s Weierstrassed MCMC (and even White et al.’s parallel ABC), namely that the computation of the likelihood can be broken into batches and MCMC run over those batches independently. In their short survey of previous works on parallelization, VanDerwerken and Schmidler overlooked our neat (!) JCGS Rao-Blackwellisation with Pierre Jacob and Murray Smith, maybe because it sounds more like post-processing than genuine parallelization (in that it does not speed up the convergence of the chain but rather improves the Monte Carlo usages one can make of this chain), maybe because they did not know of it.

“This approach has two shortcomings: first, it requires a number of independent simulations, and thus processors, equal to the size of the partition; this may grow exponentially in dim(Θ). Second, the rejection often needed for the restriction doesn’t permit easy evaluation of transition kernel densities, required below. In addition, estimating the relative weights wi with which they should be combined requires care.” (p.3)

The idea of the authors is to replace an exploration of the whole space operated via a single Markov chain (or by parallel chains acting independently which all have to “converge”) with parallel and independent explorations of parts of the space by separate Markov chains. “Small is beautiful”: it takes a shorter while to explore each set of the partition, hence to converge, and, more importantly, each chain can work in parallel to the others. More specifically, given a partition of the space, into sets Ai with posterior weights wi, parallel chains are associated with targets equal to the original target restricted to those Ai‘s. This is therefore an MCMC version of partitioned sampling. With regard to the shortcomings listed in the quote above, the authors consider that there does not need to be a bijection between the partition sets and the chains, in that a chain can move across partitions and thus contribute to several integral evaluations simultaneously. I am a bit worried about this argument since it amounts to getting a random number of simulations within each partition set Ai. In my (maybe biased) perception of partitioned sampling, this sounds somewhat counter-productive, as it increases the variance of the overall estimator. (Of course, not restricting a chain to a given partition set Ai has the incentive of avoiding a possibly massive amount of rejection steps. It is however unclear (a) whether or not it impacts ergodicity (it all depends on the way the chain is constructed, i.e. against which target(s)…) as it could lead to an over-representation of some boundaries and (b) whether or not it improves the overall convergence properties of the chain(s).)

“The approach presented here represents a solution to this problem which can completely remove the waiting times for crossing between modes, leaving only the relatively short within-mode equilibration times.” (p.4)

A more delicate issue with the partitioned MCMC approach (in my opinion!) stands with the partitioning. Indeed, in a complex and high-dimension model, the construction of the appropriate partition is a challenge in itself as we often have no prior idea where the modal areas are. Waiting for a correct exploration of the modes is indeed faster than waiting for crossing between modes, provided all modes are represented and the chain for each partition set Ai has enough energy to explore this set. It actually sounds (slightly?) unlikely that a target with huge gaps between modes will see a considerable improvement from the partioned version when the partition sets Ai are selected on the go, because some of the boundaries between the partition sets may be hard to reach with a off-the-shelf proposal. (Obviously, the second part of the method on the adaptive construction of partitions is yet in the writing and I am looking forward its aXival!)

Furthermore, as noted by Pierre Jacob (of Statisfaction fame!), the adaptive construction of the partition has a lot in common with Wang-Landau schemes. Which goal is to produce a flat histogram proposal from the current exploration of the state space. Connections with Atchadé’s and Liu’s (2010, Statistical Sinica) extension of the original Wang-Landau algorithm could have been spelled out. Esp. as the Voronoï tessellation construct seems quite innovative in this respect.

%d bloggers like this: