Archive for branching process

SMC on the 2019-2020 nCoV outbreak

Posted in Books, R, Statistics, Travel with tags , , , , , , , , , , on February 19, 2020 by xi'an

Two weeks ago, Kurcharski et al., from the CMMID nCoV working group at the London School of Hygiene and Tropical Medicine, published on medrXiv a statistical analysis via a stochastic SEIR model of the evolution of the 2019-2020 nCoV epidemics, with prediction of a peak outbreak by late February in Wuhan and a past outbreak abroad. Here are some further details on the modelling:

Transmission was modelled as a geometric random walk process, and we used sequential Monte Carlo to infer the transmission rate over time, as well as the resulting number of cases and the time-varying reproduction number, R, defined as the average number of secondary cases generated by a typical infectious individual on each day.
To calculate the likelihood, we used a Poisson observation model fitted jointly to expected values based on three model outputs. To calculate the daily expectation for each Poisson observation process, we converted these outputs into new case onset and new reported cases inside Wuhan and travelling internationally. We assumed a different relative reporting  probability for Wuhan cases compared to international cases, as assumed only a proportion of confirmed Wuhan cases had known onset dates (fixed at 0.15 based on available line list data). As destination country was known for confirmed exported cases, we used 20 time series for cases exported (or not) to most at-risk countries each day and calculated the probability of obtaining each of these datasets given the model outputs. International onset data was not disaggregated by country and so we used the total daily exported cases in our Poisson probability calculation.
I did not look much further into the medrXiv document but the model may be too simplistic as it does not seem to account for the potential under-reporting within China and the impact of the severe quarantine imposed by Chinese authorities which may mean a new outbreak as soon as the confinement is lifted.

Poisson process model for Monte Carlo methods

Posted in Books with tags , , , , , , , on February 25, 2016 by xi'an

gumblegum“Taken together this view of Monte Carlo simulation as a maximization problem is a promising direction, because it connects Monte Carlo research with the literature on optimization.”

Chris Maddison arXived today a paper on the use of Poisson processes in Monte Carlo simulation. based on the so-called Gumbel-max trick, which amounts to add to the log-probabilities log p(i) of the discrete target, iid Gumbel variables, and to take the argmax as the result of the simulation. A neat trick as it does not require the probability distribution to be normalised. And as indicated in the above quote to relate simulation and optimisation. The generalisation considered here replaces the iid Gumbel variates by a Gumbel process, which is constructed as an “exponential race”, i.e., a Poisson process with an exponential auxiliary variable. The underlying variates can be generated from a substitute density, à la accept-reject, which means this alternative bounds the true target.  As illustrated in the plot above.

The paper discusses two implementations of the principle found in an earlier NIPS 2014 paper [paper that contains most of the novelty about this method], one that refines the partition and the associated choice of proposals, and another one that exploits a branch-and-bound tree structure to optimise the Gumbel process. With apparently higher performances. Overall, I wonder at the applicability of the approach because of the accept-reject structure: it seems unlikely to apply to high dimensional problems.

While this is quite exciting, I find it surprising that this paper completely omits references to Brian Ripley’s considerable input on simulation and point processes. As well as the relevant Geyer and Møller (1994). (I am obviously extremely pleased to see that our 2004 paper with George Casella and Marty Wells is quoted there. We had written this paper in Cornell, a few years earlier, right after the 1999 JSM in Baltimore, but it has hardly been mentioned since then!)

Candy branching process

Posted in R, Statistics with tags , , , on May 6, 2010 by xi'an

The mathematical puzzle in the latest weekend edition of Le Monde is as follows:

Two kids are given three boxes of chocolates with a total of 32 pieces. Rather than sharing evenly, they play the following game: Each in turn, they pick one of the three boxes, empty its contents in a jar and pick some chocolates from one of the remaining boxes so that no box stays empty. The game ends with the current player’s loss when this is no longer possible. What is the optimal strategy?

This led me to consider a simple branching process starting from a multinomial

(u_1,v_1,w_1)\sim \mathcal{M}_3(29;1/3,1/3,1/3)

to define (x_1=1+u_1,y_1=1+v_1,z_1=1+w_1). and then following the above splitting process, namely the selection of the dead and of the split components, x_t and y_t>1 say, and the generation of

(u_{t+1},v_{t+1})\sim \mathcal{M}_2(y_t-2;1/2,1/2)

with the updated value being

(x_{t+1},y_{t+1},z_{t+1}) = (1+u_{t+1},1+v_{t+1},z_t).

This process is obviously not optimal but on the opposite completely random. Running a short R program like

N=32
prc=story=rep(1,3)+as.vector(rmultinom(1,(N-3),prob=rep(1,3)))
while (sum(prc)>3){
  if (sum(prc>1)==1)
         i=(1:3)[prc>1]           #split
   else
         i=sample((1:3)[prc>1],1) #split
   j=sample((1:3)[-i],1)          #unchanged
   prc=c(prc[j],1+as.vector(rmultinom(1,prc[i]-2,prob=rep(1,2))))
   story=rbind(story,prc)
}

leads to a histogram of the game duration which is as follows. (Note that the R command sample((1:3)[prc>1]) does not produce what it should when only one term of prc is different from 1, hence the condition.) Obviously, this is not a very interesting branching process in that the sequence always ends up in a few steps…

Of course, this does not tell much about the initial puzzle. However, discussing the problem with Antoine Dreyer and Robin Ryder led to Antoine obtaining all winning and loosing configurations up to N=32 by a recursive R algorithm and to Robin establishing a complete resolution (I do not want to unveil it before he does!) that involves the funny facts [a] any starting configuration with only odd numbers is loosing and [b] any N that is a power of 2, like 32, always produces winning configurations.