Archive for scale
size matters
Posted in Statistics with tags astronomy, black holes, M87, scale, solar system, xkcd on May 27, 2019 by xi'anscale acceleration
Posted in pictures, R, Statistics, Travel, University life with tags bayesf, Brussels, Matlab, R, rgamma, rgamma.c, scale, scale parameter, system.time on April 24, 2015 by xi'anKate Lee pointed me to a rather surprising inefficiency in matlab, exploited in Sylvia Früwirth-Schnatter’s bayesf package: running a gamma simulation by rgamma(n,a,b) takes longer and sometimes much longer than rgamma(n,a,1)/b, the latter taking advantage of the scale nature of b. I wanted to check on my own whether or not R faced the same difficulty, so I ran an experiment [while stuck in a Thalys train at Brussels, between Amsterdam and Paris…] Using different values for a [click on the graph] and a range of values of b. To no visible difference between both implementations, at least when using system.time for checking.
a=seq(.1,4,le=25) for (t in 1:25) a[t]=system.time( rgamma(10^7,.3,a[t]))[3] a=a/system.time(rgamma(10^7,.3,1))[3]
Once arrived home, I wondered about the relevance of the above comparison, since rgamma(10^7,.3,1) forces R to use 1 as a scale, which may differ from using rgamma(10^7,.3), where 1 is known to be the scale [does this sentence make sense?!]. So I rerun an even bigger experiment as
a=seq(.1,4,le=25) for (t in 1:25) a[t]=system.time( rgamma(10^8,.3,a[t]))[3] a=a/system.time(rgamma(10^8,.3))[3]
and got the graph below. Which is much more interesting because it shows that some values of a are leading to a loss of efficiency of 50%. Indeed. (The most extreme cases correspond to a=0.3, 1.1., 5.8. No clear pattern emerging.)Update
As pointed out by Martyn Plummer in his comment, the C function behind the R rgamma function and Gamma generator does take into account the scale nature of the second parameter, so the above time differences are not due to this function but rather to whatever my computer was running at the same time…! Apologies to anyone I scared with this void warning!
new typos in Monte Carlo Statistical Methods
Posted in Books, Statistics, University life with tags exponential distribution, gamma distribution, importance sampling, Metropolis-Hastings algorithms, Monte Carlo Statistical Methods, scale, simulation, truncated normal, typos on December 7, 2011 by xi'anThanks to Jay Bartroff for pointing out those typos after teaching from Monte Carlo Statistical Methods:
- On page 52, the gamma Ga(α, β) distribution uses β as a rate parameter while in other places it is a scale parameter, see, e.g. eqn (2.2) [correct, I must say the parameterisation of the gamma distribution is a pain and, while we tried to homogenise the notation with the second parameter being the rate, there are places like this where either the rate convention (as in the exponential distribution) or the scale convention (as in the generation) is the natural one…]
- Still on page 52, in Example 2.20, truncated normals are said to be discussed after Example 1.5, but they’re not. [There is a mention made of constrained parameters right after but this is rather cryptic!]
- On page 53, the ratio f/gα following the second displayed eqn is missing some terms [or, rather, the equality sign should be a proportional sign]
- Still on page 53, in eqn (2.11), the whole expression, rather than the square root, should be divided by 2 [yes, surprising typo given that it was derived correctly in the original paper!]
- On page 92, the exact constraint is that supp(g) actually needs only contain the intersection of supp(f) and supp(h), such as when approximating tail probabilities [correct if the importance sampling method is only used for a single function h, else the condition stands as is]
- On page 94, fY does not need that integral in the denominator [correct, we already corrected for the truncation by subtracting 4.5 in the exponential]
- On page 114, Problem 3.22, ωi is missing a factor of 1/n [correct]
- On page 218, in Example 6.24, P00=0 [indeed, our remark that Pxx>0 should start with x=1. Note that this does not change the aperiodicity, though]
- On page 282, the log α after the 2nd displayed equation should be eα [correct, this was pointed out in a previous list of typos, but clearly not corrected in the latest printing!]
- On page 282, in the 5th displayed equation there are missing factors π(α’|b)/π(α0|b) in rejection probability [actually, no, because, those terms being both proposals and priors, they cancel in the ratio. We could add a sentence to this effect to explain why, though.]
- On page 634, the reference page for exponential distribution is mistakenly given as 99 [wow, very thorough reading! There is an exponential distribution involved on page 99 but I agree this is not the relevant page…]
The confusing gamma parameter
Posted in Books, R, Statistics with tags dgamma, Introducing Monte Carlo Methods with R, rate, scale, shape on May 13, 2011 by xi'anBoris from Ottawa sent me this email about Introducing Monte Carlo Methods with R:
As I went through the exercises and examples, I believe I found a typo in exercise 6.4 on page 176 that is not in the list of typos posted on your website. For simulation of Gamma(a,1) random variables with candidate distribution Gamma([a],b), the optimal choice of b seems to be a/[a] rather than [a]/a as suggested in the book. Since the ratio dgamma(x,a,1)/dgamma(x,a,[a]/a) is unbounded, simulations with candidate distribution Gamma([a],[a]/a) yields poor approximation to the target distribution.
The problem with this exercise and the gamma distribution
in general is that it can be parameterised in terms of the scale or in terms of the rate, as recognised by the R [d/p/q/r]gamma functions:
GammaDist package:stats R Documentation The Gamma Distribution Description: Density, distribution function, quantile function and random generation for the Gamma distribution with parameters ‘shape’ and ‘scale’. Usage: dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE) pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE) qgamma(p, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE) rgamma(n, shape, rate = 1, scale = 1/rate) Arguments: x, q: vector of quantiles. p: vector of probabilities. n: number of observations. If ‘length(n) > 1’, the length is taken to be the number required. rate: an alternative way to specify the scale. shape, scale: shape and scale parameters. Must be positive, ‘scale’ strictly.
Thus, Boris understood b to be the scale parameter, while we meant b to be the rate parameter, meaning we are in fine in agreement about the solution! The deeper question is, why use a duplicated and hence confusing parameterisation?! The reason for doing so is that, while the scale is the natural parameter, the rate has the nicer (Bayesian) property of enjoying a gamma conjugate prior (rather than an inverse gamma conjugate prior). This is why the gamma distribution is implicitly calibrated by the rate, instead of the scale, in most of the Bayesian literature.