**W**hen playing with Peter Rossi’s bayesm R package during a visit of Jean-Michel Marin to Paris, last week, we came up with the above Gibbs outcome. The setting is a Gaussian mixture model with three components in dimension 5 and the prior distributions are standard conjugate. In this case, with 500 observations and 5000 Gibbs iterations, the Markov chain (for one component of one mean of the mixture) has two highly distinct regimes: one that revolves around the true value of the parameter, 2.5, and one that explores a much broader area (which is associated with a much smaller value of the component weight). What we found amazing is the Gibbs ability to entertain both regimes, simultaneously.

## Archive for the R Category

## amazing Gibbs sampler

Posted in Books, pictures, R, Statistics, University life with tags bayesm, convergence assessment, Gibbs sampler, Jean-Michel Marin, Markov chain Monte Carlo, mixtures, R on February 19, 2015 by xi'an## MissData 2015 in Rennes [June 18-19]

Posted in R, Statistics, Travel, University life with tags Brittany, conference, France, missing data, Rennes, Roderick Little, TGV on February 9, 2015 by xi'an**T**his (early) summer, a conference on missing data will be organised in Rennes, Brittany, with the support of the French Statistical Society [SFDS]. (Check the website if interested, Rennes is a mere two hours from Paris by fast train.)

## the density that did not exist…

Posted in Kids, R, Statistics, University life with tags cross validated, Gibbs sampling, Gumbel distribution, improper posteriors, zombie density on January 27, 2015 by xi'an**O**n Cross Validated, I had a rather extended discussion with a user about a probability density

as I thought it could be decomposed in two manageable conditionals and simulated by Gibbs sampling. The first component led to a Gumbel like density

wirh y being restricted to either (0,1) or (1,∞) depending on β. The density is bounded and can be easily simulated by an accept-reject step. The second component leads to

which offers the *slight* difficulty that it is not integrable when the first component is less than 1! So the above density does not exist (as a probability density).

What I found interesting in this question was that, for once, the Gibbs sampler was the solution rather than the problem, i.e., that it pointed out the lack of integrability of the joint. (What I found less interesting was that the user did not acknowledge a lengthy discussion that we had previously about the Gibbs implementation and that he erased, that he lost interest in the question by not following up on my answer, a seemingly common feature of his‘, and that he did not provide neither source nor motivation for this zombie density.)

## Sequential Monte Carlo 2015 workshop

Posted in pictures, R, Statistics, Travel, University life, Wines with tags ENSAE, Monte Carlo Statistical Methods, Paris, sequential Monte Carlo, SMC 2015, workshop on January 22, 2015 by xi'an## simulation by inverse cdf

Posted in Books, Kids, R, Statistics, University life with tags Box-Muller algorithm, cross validated, inverse cdf, logarithm, normal distribution, qnorm() on January 14, 2015 by xi'an**A**nother Cross Validated forum question that led me to an interesting (?) reconsideration of certitudes! When simulating from a normal distribution, is Box-Muller algorithm better or worse than using the inverse cdf transform? My first reaction was to state that Box-Muller was exact while the inverse cdf relied on the coding of the inverse cdf, like *qnorm()* in R. Upon reflection and commenting by other members of the forum, like William Huber, I came to moderate this perspective since Box-Muller also relies on transcendental functions like *sin* and *log*, hence writing

also involves approximating in the coding of those functions. While it is feasible to avoid the call to trigonometric functions (see, e.g., Algorithm A.8 in our book), the call to the logarithm seems inescapable. So it ends up with the issue of which of the two functions is better coded, both in terms of speed and precision. Surprisingly, when coding in R, the inverse cdf may be the winner: here is the comparison I ran at the time I wrote my comments

> system.time(qnorm(runif(10^8))) sutilisateur système écoulé 10.137 0.120 10.251 > system.time(rnorm(10^8)) utilisateur système écoulé 13.417 0.060 13.472`

However re-rerunning it today, I get opposite results (pardon my French, I failed to turn the messages to English):

> system.time(qnorm(runif(10^8))) utilisateur système écoulé 10.137 0.144 10.274 > system.time(rnorm(10^8)) utilisateur système écoulé 7.894 0.060 7.948

(There is coherence in the system time, which shows *rnorm* as twice as fast as the call to *qnorm*.) In terms, of precision, I could not spot a divergence from normality, either through a ks.test over 10⁸ simulations or in checking the tails:

“Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”. Luc Devroye, Non-uniform random variate generation, 1985

** Update:** As pointed out by Radford Neal in his comment, the above comparison is meaningless because the function

*rnorm*() is by default based on the inversion of

*qnorm*()! As indicated by Alexander Blocker in another comment, to use an other generator requires calling RNG as in

RNGkind(normal.kind = “Box-Muller”)

(And thanks to Jean-Louis Foulley for salvaging this quote from Luc Devroye, which does not appear to apply to the current coding of the Gaussian inverse cdf.)

## top posts for 2014

Posted in Books, R, Statistics, University life with tags book reviews, Le Monde, simulated annealing, Ubuntu 14.04 on December 30, 2014 by xi'anHere are the most popular entries for 2014:

What I appreciate from that list is that (a) book reviews [of stats books] get a large chunk (50%!) of the attention and (b) my favourite topics of Bayesian testing, parallel MCMC and MCMC on zero measure sets made it to the top list. Even the demise of the Bayes factor that was only posted two weeks ago!

## amazonish thanks (& repeated warning)

Posted in Books, Kids, R, Statistics with tags Amazon, amazon associates, book reviews, dog life jacket, Monte Carlo Statistical Methods, Og on December 9, 2014 by xi'an**A**s in previous years, at about this time, I want to (re)warn unaware ‘Og readers that all links to Amazon.com and more rarely to Amazon.fr found on this blog are actually susceptible to earn me an advertising percentage if a purchase is made by the reader* in the 24 hours following the entry on Amazon through this link*, thanks to the “*Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com/fr*“. Unlike last year, I did not benefit as much from the new edition of Andrew’s book, and the link he copied from my blog entry… Here are some of the most Og-unrelated purchases:

- Mr. Beer Deluxe Beer Bottling System
- Kyjen 2518 Dog Life Jacket
- Fisher-Price Learn-to-Flush Potty
- Way Huge Green Rhino
- WWII Helmets and Headgear

Once again, books I reviewed, positively or negatively, were among the top purchases… Like a dozen Monte Carlo simulation and resampling methods for social science , a few copies of Naked Statistics. And again a few of The Cartoon Introduction to Statistics. (Despite a most critical review.) Thanks to all of you using those links and feeding further my book addiction, with the drawback of inducing even more fantasy book reviews.