**A** question on Cross Validated led me to realise I had never truly considered the issue of periodic Gibbs samplers! In MCMC, non-aperiodic chains are a minor nuisance in that the skeleton trick of randomly subsampling the Markov chain leads to a aperiodic Markov chain. (The picture relates to the skeleton!) Intuitively, while the systematic Gibbs sampler has a tendency to non-reversibility, it seems difficult to imagine a sequence of full conditionals that would force the chain away from the current value..!In the discrete case, given that the current state of the Markov chain has positive probability for the target distribution, the conditional probabilities are all positive as well and hence the Markov chain can stay at its current value after one Gibbs cycle, with positive probabilities, which means strong aperiodicity. In the continuous case, a similar argument applies by considering a neighbourhood of the current value. (Incidentally, the same person asked a question about the absolute continuity of the Gibbs kernel. Being confused by our chapter on the topic!!!)

## Archive for cross validated

## aperiodic Gibbs sampler

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags aperiodicity, convergence, cross validated, Gibbs sampler, Markov chain, MCMC algorithms, Monte Carlo Statistical Methods, skeleton chain on February 11, 2015 by xi'an## minimaxity of a Bayes estimator

Posted in Books, Kids, Statistics, University life with tags Bayes estimators, cross validated, generalised Bayes estimators, mathematical statistics, minimaxity, serial upvoting on February 2, 2015 by xi'an**T**oday, while in Warwick, I spotted on Cross Validated a question involving “minimax” in the title and hence could not help but look at it! The way I first understood the question (and immediately replied to it) was to check whether or not the standard Normal average—reduced to the single Normal observation by sufficiency considerations—is a minimax estimator of the normal mean under an interval zero-one loss defined by

where L is a positive tolerance bound. I had not seen this problem before, even though it sounds quite standard. In this setting, the identity estimator, i.e., the normal observation x, is indeed minimax as (a) it is a generalised Bayes estimator—Bayes estimators under this loss are given by the centre of an equal posterior interval—for this loss function under the constant prior and (b) it can be shown to be a limit of proper Bayes estimators and its Bayes risk is also the limit of the corresponding Bayes risks. (This is a most traditional way of establishing minimaxity for a generalised Bayes estimator.) However, this was not the question asked on the forum, as the book by Zacks it referred to stated that the standard Normal average maximised the minimal coverage, which amounts to the maximal risk under the above loss. With the strange inversion of parameter and estimator in the minimax risk:

which makes the first bound equal to 0 by equating estimator and mean μ. Note however that I cannot access the whole book and hence may miss some restriction or other subtlety that would explain for this unusual definition. (As an aside, note that Cross Validated has a protection against serial upvoting, So voting up or down at once a large chunk of my answers on that site does not impact my “reputation”!)

## the density that did not exist…

Posted in Kids, R, Statistics, University life with tags cross validated, Gibbs sampling, Gumbel distribution, improper posteriors, zombie density on January 27, 2015 by xi'an**O**n Cross Validated, I had a rather extended discussion with a user about a probability density

as I thought it could be decomposed in two manageable conditionals and simulated by Gibbs sampling. The first component led to a Gumbel like density

wirh y being restricted to either (0,1) or (1,∞) depending on β. The density is bounded and can be easily simulated by an accept-reject step. The second component leads to

which offers the *slight* difficulty that it is not integrable when the first component is less than 1! So the above density does not exist (as a probability density).

What I found interesting in this question was that, for once, the Gibbs sampler was the solution rather than the problem, i.e., that it pointed out the lack of integrability of the joint. (What I found less interesting was that the user did not acknowledge a lengthy discussion that we had previously about the Gibbs implementation and that he erased, that he lost interest in the question by not following up on my answer, a seemingly common feature of his‘, and that he did not provide neither source nor motivation for this zombie density.)

## simulation by inverse cdf

Posted in Books, Kids, R, Statistics, University life with tags Box-Muller algorithm, cross validated, inverse cdf, logarithm, normal distribution, qnorm() on January 14, 2015 by xi'an**A**nother Cross Validated forum question that led me to an interesting (?) reconsideration of certitudes! When simulating from a normal distribution, is Box-Muller algorithm better or worse than using the inverse cdf transform? My first reaction was to state that Box-Muller was exact while the inverse cdf relied on the coding of the inverse cdf, like *qnorm()* in R. Upon reflection and commenting by other members of the forum, like William Huber, I came to moderate this perspective since Box-Muller also relies on transcendental functions like *sin* and *log*, hence writing

also involves approximating in the coding of those functions. While it is feasible to avoid the call to trigonometric functions (see, e.g., Algorithm A.8 in our book), the call to the logarithm seems inescapable. So it ends up with the issue of which of the two functions is better coded, both in terms of speed and precision. Surprisingly, when coding in R, the inverse cdf may be the winner: here is the comparison I ran at the time I wrote my comments

> system.time(qnorm(runif(10^8))) sutilisateur système écoulé 10.137 0.120 10.251 > system.time(rnorm(10^8)) utilisateur système écoulé 13.417 0.060 13.472`

However re-rerunning it today, I get opposite results (pardon my French, I failed to turn the messages to English):

> system.time(qnorm(runif(10^8))) utilisateur système écoulé 10.137 0.144 10.274 > system.time(rnorm(10^8)) utilisateur système écoulé 7.894 0.060 7.948

(There is coherence in the system time, which shows *rnorm* as twice as fast as the call to *qnorm*.) In terms, of precision, I could not spot a divergence from normality, either through a ks.test over 10⁸ simulations or in checking the tails:

“Only the inversion method is inadmissible because it is slower and less space efficient than all of the other methods, the table methods excepted”. Luc Devroye, Non-uniform random variate generation, 1985

** Update:** As pointed out by Radford Neal in his comment, the above comparison is meaningless because the function

*rnorm*() is by default based on the inversion of

*qnorm*()! As indicated by Alexander Blocker in another comment, to use an other generator requires calling RNG as in

RNGkind(normal.kind = “Box-Muller”)

(And thanks to Jean-Louis Foulley for salvaging this quote from Luc Devroye, which does not appear to apply to the current coding of the Gaussian inverse cdf.)

## which parameters are U-estimable?

Posted in Books, Kids, Statistics, University life with tags cross validated, epiphany, Erich Lehmann, George Casella, mathematical statistics, Theory of Point Estimation, U-estimability, unbiased estimation on January 13, 2015 by xi'an**T**oday *(01/06)* was a double epiphany in that I realised that one of my long-time beliefs about unbiased estimators did not hold. Indeed, when checking on Cross Validated, I found this question: For which distributions is there a closed-form unbiased estimator for the standard deviation? And the presentation includes the normal case for which indeed there exists an unbiased estimator of σ, namely

which derives directly from the chi-square distribution of the sum of squares divided by σ². When thinking further about it, if *a posteriori*!, it is now fairly obvious given that σ is a *scale* parameter. Better, any power of σ can be similarly estimated in a unbiased manner, since

And this property extends to all location-scale models.

So how on Earth was I so convinced that there was no unbiased estimator of σ?! I think it stems from reading too quickly a result in, I think, Lehmann and Casella, result due to Peter Bickel and Erich Lehmann that states that, for a convex family of distributions *F*, there exists an unbiased estimator of a functional *q(F)* (for a sample size *n* large enough) if and only if *q(αF+(1−α)G)* is a polynomial in *0≤α≤1*. Because of this, I had this [wrong!] impression that only polynomials of the natural parameters of exponential families can be estimated by unbiased estimators… Note that Bickel’s and Lehmann’s theorem does not apply to the problem here because the collection of Gaussian distributions is not convex (a mixture of Gaussians is not a Gaussian).

This leaves open the question as to which transforms of the parameter(s) are unbiasedly estimable (or U-estimable) for a given parametric family, like the normal N(μ,σ²). I checked in Lehmann’s first edition earlier today and could not find an answer, besides the definition of U-estimability. Not only the question is interesting *per se* but the answer could come to correct my long-going impression that unbiasedness is a rare event, i.e., that the collection of transforms of the model parameter that are U-estimable is a very small subset of the whole collection of transforms.

## Bhattacharyya distance versus Kullback-Leibler divergence

Posted in Books, Kids, Statistics with tags Bhattacharyya distance, cross validated, Hellinger distance, Jensen inequality, Kullback-Leibler divergence on January 10, 2015 by xi'an**A**nother question I picked on Cross Validated during the Yule break is about the connection between the Bhattacharyya distance and the Kullback-Leibler divergence, i.e.,

and

Although this Bhattacharyya distance sounds close to the Hellinger distance,

the ordering I got by a simple Jensen inequality is

and I wonder how useful this ordering could be…

## the Grumble distribution and an ODE

Posted in Books, Kids, R, Statistics, University life with tags cross validated, differential equation, forum, Gumble distribution, probability distribution, Runge-Kutta, StackExchange on December 3, 2014 by xi'anAs ‘Og’s readers may have noticed, I paid some recent visits to Cross Validated (although I find this too addictive to be sustainable on a long term basis!, and as already reported a few years ago frustrating at several levels from questions asked without any preliminary personal effort, to a lack of background material to understand hints towards the answer, to not even considering answers [once the homework due date was past?], &tc.). Anyway, some questions are nonetheless great puzzles, to with this one about the possible transformation of a random variable R with density

into a Gumble distribution. While the better answer is that it translates into a power law,

,

I thought using the S=R² transform could work but obtained a wrong sign in the pseudo-Gumble density

and then went into seeking another transform into a Gumbel rv T, which amounted to solve the differential equation

As I could not solve analytically the ODE, I programmed a simple Runge-Kutta numerical resolution as follows:

solvR=function(prec=10^3,maxz=1){ z=seq(1,maxz,le=prec) t=rep(1,prec) #t(1)=1 for (i in 2:prec) t[i]=t[i-1]+(z[i]-z[i-1])*exp(-z[i-1]+ exp(-z[i-1])+t[i-1]+exp(-t[i-1])) zold=z z=seq(.1/maxz,1,le=prec) t=c(t[-prec],t) for (i in (prec-1):1) t[i]=t[i+1]+(z[i]-z[i+1])*exp(-z[i+1]+ exp(-z[i+1])+t[i+1]+exp(-t[i+1])) return(cbind(c(z[-prec],zold),t)) }

Which shows that [the increasing] t(w) quickly gets too large for the function to be depicted. But this is a fairly useless result in that a transform of the original variable and of its parameter into an arbitrary distribution is always possible, given that W above has a fixed distribution… Hence the pun on Gumble in the title.