When playing with Peter Rossi’s bayesm R package during a visit of Jean-Michel Marin to Paris, last week, we came up with the above Gibbs outcome. The setting is a Gaussian mixture model with three components in dimension 5 and the prior distributions are standard conjugate. In this case, with 500 observations and 5000 Gibbs iterations, the Markov chain (for one component of one mean of the mixture) has two highly distinct regimes: one that revolves around the true value of the parameter, 2.5, and one that explores a much broader area (which is associated with a much smaller value of the component weight). What we found amazing is the Gibbs ability to entertain both regimes, simultaneously.
Archive for the R Category
On Cross Validated, I had a rather extended discussion with a user about a probability density
as I thought it could be decomposed in two manageable conditionals and simulated by Gibbs sampling. The first component led to a Gumbel like density
wirh y being restricted to either (0,1) or (1,∞) depending on β. The density is bounded and can be easily simulated by an accept-reject step. The second component leads to
which offers the slight difficulty that it is not integrable when the first component is less than 1! So the above density does not exist (as a probability density).
What I found interesting in this question was that, for once, the Gibbs sampler was the solution rather than the problem, i.e., that it pointed out the lack of integrability of the joint. (What I found less interesting was that the user did not acknowledge a lengthy discussion that we had previously about the Gibbs implementation and that he erased, that he lost interest in the question by not following up on my answer, a seemingly common feature of his‘, and that he did not provide neither source nor motivation for this zombie density.)
Another Cross Validated forum question that led me to an interesting (?) reconsideration of certitudes! When simulating from a normal distribution, is Box-Muller algorithm better or worse than using the inverse cdf transform? My first reaction was to state that Box-Muller was exact while the inverse cdf relied on the coding of the inverse cdf, like qnorm() in R. Upon reflection and commenting by other members of the forum, like William Huber, I came to moderate this perspective since Box-Muller also relies on transcendental functions like sin and log, hence writing
also involves approximating in the coding of those functions. While it is feasible to avoid the call to trigonometric functions (see, e.g., Algorithm A.8 in our book), the call to the logarithm seems inescapable. So it ends up with the issue of which of the two functions is better coded, both in terms of speed and precision. Surprisingly, when coding in R, the inverse cdf may be the winner: here is the comparison I ran at the time I wrote my comments
> system.time(qnorm(runif(10^8))) sutilisateur système écoulé 10.137 0.120 10.251 > system.time(rnorm(10^8)) utilisateur système écoulé 13.417 0.060 13.472`
However re-rerunning it today, I get opposite results (pardon my French, I failed to turn the messages to English):
> system.time(qnorm(runif(10^8))) utilisateur système écoulé 10.137 0.144 10.274 > system.time(rnorm(10^8)) utilisateur système écoulé 7.894 0.060 7.948
(There is coherence in the system time, which shows rnorm as twice as fast as the call to qnorm.) In terms, of precision, I could not spot a divergence from normality, either through a ks.test over 10⁸ simulations or in checking the tails:
“Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”. Luc Devroye, Non-uniform random variate generation, 1985
Update: As pointed out by Radford Neal in his comment, the above comparison is meaningless because the function rnorm() is by default based on the inversion of qnorm()! As indicated by Alexander Blocker in another comment, to use an other generator requires calling RNG as in
RNGkind(normal.kind = “Box-Muller”)
(And thanks to Jean-Louis Foulley for salvaging this quote from Luc Devroye, which does not appear to apply to the current coding of the Gaussian inverse cdf.)
Here are the most popular entries for 2014:
What I appreciate from that list is that (a) book reviews [of stats books] get a large chunk (50%!) of the attention and (b) my favourite topics of Bayesian testing, parallel MCMC and MCMC on zero measure sets made it to the top list. Eventhat was only posted two weeks ago!
As in previous years, at about this time, I want to (re)warn unaware ‘Og readers that all links to Amazon.com and more rarely to Amazon.fr found on this blog are actually susceptible to earn me an advertising percentage if a purchase is made by the reader in the 24 hours following the entry on Amazon through this link, thanks to the “Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com/fr“. Unlike last year, I did not benefit as much from the new edition of Andrew’s book, and the link he copied from my blog entry… Here are some of the most Og-unrelated purchases:
- Mr. Beer Deluxe Beer Bottling System
- Kyjen 2518 Dog Life Jacket
- Fisher-Price Learn-to-Flush Potty
- Way Huge Green Rhino
- WWII Helmets and Headgear
Once again, books I reviewed, positively or negatively, were among the top purchases… Like a dozen Monte Carlo simulation and resampling methods for social science , a few copies of Naked Statistics. And again a few of The Cartoon Introduction to Statistics. (Despite a most critical review.) Thanks to all of you using those links and feeding further my book addiction, with the drawback of inducing even more fantasy book reviews.