## Example 7.3: what a mess!

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on November 13, 2016 by xi'an

A rather obscure question on Metropolis-Hastings algorithms on X Validated ended up being about our first illustration in Introducing Monte Carlo methods with R. And exposing some inconsistencies in the following example… Example 7.2 is based on a [toy] joint Beta x Binomial target, which leads to a basic Gibbs sampler. We thought this was straightforward, but it may confuse readers who think of using Gibbs sampling for posterior simulation as, in this case, there is neither observation nor posterior, but simply a (joint) target in (x,θ).

And then it indeed came out that we had incorrectly written Example 7.3 on the [toy] Normal posterior, using at times a Normal mean prior with a [prior] variance scaled by the sampling variance and at times a Normal mean prior with a [prior] variance unscaled by the sampling variance. I am rather amazed that this did not show up earlier. Although there were already typos listed about that example.

## variance of an exponential order statistics

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , on November 10, 2016 by xi'an

This afternoon, one of my Monte Carlo students at ENSAE came to me with an exercise from Monte Carlo Statistical Methods that I did not remember having written. And I thus “charged” George Casella with authorship for that exercise!

Exercise 3.3 starts with the usual question (a) about the (Binomial) precision of a tail probability estimator, which is easy to answer by iterating simulation batches. Expressed via the empirical cdf, it is concerned with the vertical variability of this empirical cdf. The second part (b) is more unusual in that the first part is again an evaluation of a tail probability, but then it switches to find the .995 quantile by simulation and produce a precise enough [to three digits] estimate. Which amounts to assess the horizontal variability of this empirical cdf.

As we discussed about this question, my first suggestion was to aim at a value of N, number of Monte Carlo simulations, such that the .995 x N-th spacing had a length of less than one thousandth of the .995 x N-th order statistic. In the case of the Exponential distribution suggested in the exercise, generating order statistics is straightforward, since, as suggested by Devroye, see Section V.3.3, the i-th spacing is an Exponential variate with rate (N-i+1). This is so fast that Devroye suggests simulating Uniform order statistics by inverting Exponential order statistics (p.220)!

However, while still discussing the problem with my student, I came to a better expression of the question, which was to figure out the variance of the .995 x N-th order statistic in the Exponential case. Working with the density of this order statistic however led nowhere useful. A bit later, after Google-ing the problem, I came upon this Stack Exchange solution that made use of the spacing result mentioned above, namely that the expectation and variance of the k-th order statistic are

$\mathbb{E}[X_{(k)}]=\sum\limits_{i=N-k+1}^N\frac1i,\qquad \mbox{Var}(X_{(k)})=\sum\limits_{i=N-k+1}^N\frac1{i^2}$

which leads to the proper condition on N when imposing the variability constraint.

## Poisson process model for Monte Carlo methods

Posted in Books with tags , , , , , , , on February 25, 2016 by xi'an

“Taken together this view of Monte Carlo simulation as a maximization problem is a promising direction, because it connects Monte Carlo research with the literature on optimization.”

Chris Maddison arXived today a paper on the use of Poisson processes in Monte Carlo simulation. based on the so-called Gumbel-max trick, which amounts to add to the log-probabilities log p(i) of the discrete target, iid Gumbel variables, and to take the argmax as the result of the simulation. A neat trick as it does not require the probability distribution to be normalised. And as indicated in the above quote to relate simulation and optimisation. The generalisation considered here replaces the iid Gumbel variates by a Gumbel process, which is constructed as an “exponential race”, i.e., a Poisson process with an exponential auxiliary variable. The underlying variates can be generated from a substitute density, à la accept-reject, which means this alternative bounds the true target.  As illustrated in the plot above.

The paper discusses two implementations of the principle found in an earlier NIPS 2014 paper [paper that contains most of the novelty about this method], one that refines the partition and the associated choice of proposals, and another one that exploits a branch-and-bound tree structure to optimise the Gumbel process. With apparently higher performances. Overall, I wonder at the applicability of the approach because of the accept-reject structure: it seems unlikely to apply to high dimensional problems.

While this is quite exciting, I find it surprising that this paper completely omits references to Brian Ripley’s considerable input on simulation and point processes. As well as the relevant Geyer and Møller (1994). (I am obviously extremely pleased to see that our 2004 paper with George Casella and Marty Wells is quoted there. We had written this paper in Cornell, a few years earlier, right after the 1999 JSM in Baltimore, but it has hardly been mentioned since then!)

## running out of explanations

Posted in Books, Kids, Statistics with tags , , , , , on September 23, 2015 by xi'an

A few days ago, I answered a self-study question on Cross Validated about the convergence in probability of 1/X given the convergence in probability of X to a. Until I ran out of explanations… I did not see how to detail any further the connection between both properties! The reader (OP) started from a resolution of the corresponding exercise in Casella and Berger’s Statistical Inference and could not follow the steps, some of which were incorrect. But my attempts at making him uncover the necessary steps failed, presumably because he was sticking to this earlier resolution rather than starting from the definition of convergence in probability. And he could not get over the equality

$\mathbb{P}(|a/X_{i} - 1| < \epsilon)=\mathbb{P}\left(a-{{a\epsilon}\over{1 + \epsilon}} < X_{i} < a + {{a\epsilon}\over{1 - \epsilon}}\right)$

which is the central reason why one convergence transfers to the other… I know I know nothing, and even less about pedagogy, but it is (just so mildly!) frustrating to hit a wall beyond which no further explanation can help! Feel free to propose an alternative resolution.

Update: A few days later, readers of Cross Validated pointed out that the question had been answered by whuber in a magisterial way. But I wonder if my original reader appreciated this resolution, since he did not pursue the issue.

## George’s dream

Posted in Kids, Travel with tags , , , , , on April 11, 2015 by xi'an

While I have shared this idea with many of my friends [in both senses that I mentioned it and that they shared the same feeling that it would be a great improvement], the first time I heard of the notion was in George Casella‘s kitchen in Ithaca, New York, in the early 1990’s… We were emptying the dishwasher together and George was reflecting that it would be so convenient to have a double dishwasher and remove the need to empty it altogether! Although, at the moral level, I think that we should do without dishwashers, I found this was a terrific idea and must have told the joke to most of my friends. I was nonetheless quite surprised and very pleased to receive the news from Nicole today that Fisher & Paykel (from Auckland, New Zealand) had gone all the way to produce a double dishwasher, or more exactly a double dishdrawer, perfectly suited to George’s wishes! (Pleased that she remembered the notion after all those years, not pleased with the prospect of buying a double dish washer for more than double the cost of [and a smaller volume than] a regular dishwasher!)

## an email exchange about integral representations

Posted in Books, R, Statistics, University life with tags , , , , on April 8, 2015 by xi'an

I had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral

$\mathbb{P}(Y\in \mathcal{A},U\le f(Y)/Mg(Y)) = \int_\mathcal{A} \int_0^{f(y)/Mg(y)}\,\text{d}u\,g(y)\,\text{d}y\,,$

in that it took me several iterations [as shown in the above] to realise the issue was with the notation

$\int_0^a \,\text{d}u\,,$

which seemed to be missing a density term or, in other words, be different from

$\int_0^1 \,\mathbb{I}_{(0,a)}(u)\,\text{d}u\,,$

What is surprising for me is that the integral

$\int_0^a \,\text{d}u$

has a clear meaning as a Riemann integral, hence should be more intuitive….

## which parameters are U-estimable?

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on January 13, 2015 by xi'an

Today (01/06) was a double epiphany in that I realised that one of my long-time beliefs about unbiased estimators did not hold. Indeed, when checking on Cross Validated, I found this question: For which distributions is there a closed-form unbiased estimator for the standard deviation? And the presentation includes the normal case for which indeed there exists an unbiased estimator of σ, namely

$\frac{\Gamma(\{n-1\}/{2})}{\Gamma({n}/{2})}2^{-1/2}\sqrt{\sum_{k=1}^n(x_i-\bar{x})^2}$

which derives directly from the chi-square distribution of the sum of squares divided by σ². When thinking further about it, if a posteriori!, it is now fairly obvious given that σ is a scale parameter. Better, any power of σ can be similarly estimated in a unbiased manner, since

$\left\{\sum_{k=1}^n(x_i-\bar{x})^2\right\}^\alpha \propto\sigma^\alpha\,.$

And this property extends to all location-scale models.

So how on Earth was I so convinced that there was no unbiased estimator of σ?! I think it stems from reading too quickly a result in, I think, Lehmann and Casella, result due to Peter Bickel and Erich Lehmann that states that, for a convex family of distributions F, there exists an unbiased estimator of a functional q(F) (for a sample size n large enough) if and only if q(αF+(1-α)G) is a polynomial in 0α1. Because of this, I had this [wrong!] impression that only polynomials of the natural parameters of exponential families can be estimated by unbiased estimators… Note that Bickel’s and Lehmann’s theorem does not apply to the problem here because the collection of Gaussian distributions is not convex (a mixture of Gaussians is not a Gaussian).

This leaves open the question as to which transforms of the parameter(s) are unbiasedly estimable (or U-estimable) for a given parametric family, like the normal N(μ,σ²). I checked in Lehmann’s first edition earlier today and could not find an answer, besides the definition of U-estimability. Not only the question is interesting per se but the answer could come to correct my long-going impression that unbiasedness is a rare event, i.e., that the collection of transforms of the model parameter that are U-estimable is a very small subset of the whole collection of transforms.