Archive for Geometric distribution

pure birth process

Posted in Statistics with tags , , , , , , on April 18, 2020 by xi'an

The Riddler has a rather simplistic riddle this week since it essentially asked for the expectation of a pure birth process (also known as the Yule process) at time t. Since the population size at time t has a geometric distribution with expectation

eλt.

It however took me a while to recover this result on my own on Easter afternoon, as I went for the integrals rather than the distribution itself and the associated differential equations. Interestingly (in a local sense!), I first following the wrong path of looking at the average time to the first birth, 1/λ, then to the second, 2/λ, and so on. Wrong since of course expectations do not carry this way… For a unit rate,  λ=1, the average time to reach 10 births is about 3, while the average number of births over t=3 is essentially 20.

survivalists [a Riddler’s riddle]

Posted in Books, Kids, R, Statistics with tags , , , , , , on April 22, 2019 by xi'an

A neat question from The Riddler on a multi-probability survival rate:

Nine processes are running in a loop with fixed survivals rates .99,….,.91. What is the probability that the first process is the last one to die? Same question with probabilities .91,…,.99 and the probability that the last process is the last one to die.

The first question means that the realisation of a Geometric G(.99) has to be strictly larger than the largest of eight Geometric G(.98),…,G(.91). Given that the cdf of a Geometric G(a) is [when counting the number of attempts till failure, included, i.e. the Geometric with support the positive integers]

F(x)=\Bbb P(X\le x)=1-a^{x}

the probability that this happens has the nice (?!) representation

\sum_{x=2}^\infty a_1^{x-1}(1-a_1)\prod_{j\ge 2}(1-a_j^{x-1})=(1-a_1)G(a_1,\ldots,a_9)

which leads to an easy resolution by recursion since

G(a_1,\ldots,a_9)=G(a_1,\ldots,a_8)-G(a_1a_9,\ldots,a_8)

and G(a)=a/(1-a)

and a value of 0.5207 returned by R (Monte Carlo evaluation of 0.5207 based on 10⁷ replications). The second question is quite similar, with solution

\sum_{x=2}^\infty a_1^{x-1}(1-a_1)\prod_{j\ge 1}(1-a_j^{x})=a^{-1}(1-a_1)G(a_1,\ldots,a_9)

and value 0.52596 (Monte Carlo evaluation of 0.52581 based on 10⁷ replications).

standard distributions

Posted in Books, Kids, Statistics with tags , , , on February 5, 2016 by xi'an

Joram Soch managed to get a short note arXived about the Normal cdf Φ by exhibiting an analytical version, nothing less!!! By which he means a power series representation of that cdf. This is an analytical [if known] function in the complex calculus sense but I wonder at the point of the (re)derivation. (I do realise that something’s wrong on the Internet is not breaking news!)

Somewhat tangentially, this reminds me of a paper I read recently where the Geometric Geo(p) distribution was represented as the sum of two independent variates, namely a Binomial B(p/(1+p)) variate and a Geometric 2G(p²) variate. A formula that can be iterated for arbitrarily long, meaning that a Geometric variate is an infinite sum of [powers of two] weighted Bernoulli variates. I like this representation very much (although it may well have been know for quite a while). However I fail to see how to take advantage of it for simulation purposes. Unless the number of terms in the sum can be determined first. And even then it would be less efficient than simulating a single Geometric…