Archive for George Casella

which parameters are U-estimable?

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on January 13, 2015 by xi'an

Today (01/06) was a double epiphany in that I realised that one of my long-time beliefs about unbiased estimators did not hold. Indeed, when checking on Cross Validated, I found this question: For which distributions is there a closed-form unbiased estimator for the standard deviation? And the presentation includes the normal case for which indeed there exists an unbiased estimator of σ, namely


which derives directly from the chi-square distribution of the sum of squares divided by σ². When thinking further about it, if a posteriori!, it is now fairly obvious given that σ is a scale parameter. Better, any power of σ can be similarly estimated in a unbiased manner, since

\left\{\sum_{k=1}^n(x_i-\bar{x})^2\right\}^\alpha \propto\sigma^\alpha\,.

And this property extends to all location-scale models.

So how on Earth was I so convinced that there was no unbiased estimator of σ?! I think it stems from reading too quickly a result in, I think, Lehmann and Casella, result due to Peter Bickel and Erich Lehmann that states that, for a convex family of distributions F, there exists an unbiased estimator of a functional q(F) (for a sample size n large enough) if and only if q(αF+(1α)G) is a polynomial in 0α1. Because of this, I had this [wrong!] impression that only polynomials of the natural parameters of exponential families can be estimated by unbiased estimators… Note that Bickel’s and Lehmann’s theorem does not apply to the problem here because the collection of Gaussian distributions is not convex (a mixture of Gaussians is not a Gaussian).

This leaves open the question as to which transforms of the parameter(s) are unbiasedly estimable (or U-estimable) for a given parametric family, like the normal N(μ,σ²). I checked in Lehmann’s first edition earlier today and could not find an answer, besides the definition of U-estimability. Not only the question is interesting per se but the answer could come to correct my long-going impression that unbiasedness is a rare event, i.e., that the collection of transforms of the model parameter that are U-estimable is a very small subset of the whole collection of transforms.

O-Bayes15 [registration & call for papers]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , on January 5, 2015 by xi'an

Valencia, Feb. 20, 2010Both registration and call for papers have now been posted on the webpage of the 11th International Workshop on Objective Bayes Methodology, aka O-Bayes 15, that will take place in Valencia next June 1-5.  The spectrum of the conference is quite wide, as reflected by the range of speakers. In addition, this conference is dedicated to our friend Susie Bayarri, to celebrate her life and contributions to Bayesian Statistics. And in continuation of the morning jog in the memory of George Casella organised by Laura Ventura in Padova, there will be a morning jog for Susie. So register for the meeting and bring your running shoes!

back in Gainesville (FL)

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , on November 12, 2014 by xi'an


Today, I am flying to Gainesville, Florida, for the rest of the week, to give a couple of lectures. More precisely, I have actually been nominated the 2014 Challis lecturer by the Department of Statistics there, following an impressive series of top statisticians (most of them close friends, is there a correlation there?!). I am quite excited to meet again with old friends and to be back at George’s University, if only for a little less than three days. (There is a certain trend in those Fall trips as I have been going for a few days and two talks to the USA or Canada for the past three Falls: to Ames and Chicago in 2012, to Pittsburgh (CMU) and Toronto in 2013…)

no more car talk

Posted in Books, Kids, Travel with tags , , , , , on November 9, 2014 by xi'an

When I first came went to the US in 1987, I switched from listening to the French public radio to listening to NPR, the National Public Radio network. However, it was not until I met both George Casella and Bernhard Flury that I started listening to “Car Talk”, the Sunday morning talk-show by the Magliozzi brothers where listeners would call and expose their car problem and get jokes and sometime advice in reply. Both George and Bernhard were big fans of the show, much more for the unbelievable high spirits it provided than for any deep interest in mechanics. And indeed there was something of the spirit of Zen and the art of motorcycle maintenance in that show, namely that through mechanical issues, people would come to expose deeper worries that the Magliozzi brothers would help bring out, playing the role of garage-shack psychiatrists…Which made me listen to them, despite my complete lack of interest in car, mechanics and repair in general.

One of George’s moments of fame was when he wrote to the Magliozzi brothers about Monty Hall’s problem, because they had botched their explanation as to why one should always change door. And they read it on the air, with the line “Who is this Casella guy from Cornell University? A professor? A janitor?” since George had just signed George Casella, Cornell University. Besides, Bernhard was such a fan of the show that he taped every single morning show, that he would later replay on long car trips (I do not know how his familly enjoyed the exposure to the show, though!). And so happened to have this line about George on tape, that he sent him a few weeks later… I am reminiscing all this because I saw in the NYT today that the older brother, Tom Magliozzi, had just died. Some engines can alas not be fixed… But I am sure there will be a queue of former car addicts in some heavenly place eager to ask him their question about their favourite car. Thanks for the ride, Tom!

hasta luego, Susie!

Posted in Statistics, University life with tags , , , , , , , , on August 20, 2014 by xi'an

I just heard that our dear, dear friend Susie Bayarri passed away early this morning, on August 19, in Valencià, Spain… I had known Susie for many, many years, our first meeting being in Purdue in 1987, and we shared many, many great times during simultaneous visits to Purdue University and Cornell University in the 1990’s. During a workshop in Cornell organised by George Casella (to become the unforgettable Camp Casella!), we shared a flat together and our common breakfasts led her to make fun of my abnormal consumption of cereals  forever after, a recurrent joke each time we met! Another time, we were coming from the movie theatre in Lafayette in Susie’ s car when we got stopped for going through a red light. Although she tried very hard, her humour and Spanish verve were for once insufficient to convince her interlocutor.

Susie was a great Bayesian, contributing to the foundations of Bayesian testing in her numerous papers and through the direction of deep PhD theses in Valencia. As well as to queuing systems and computer models. She was also incredibly active in ISBA, from the very start of the Bayesian society, and was one of the first ISBA presidents. She also definitely contributed to the Objective Bayes section of ISBA, especially in the construction of the O’Bayes meetings. She gave a great tutorial on Bayes factors at the last O’Bayes conference in Duke last December, full of jokes and passion, despite being already weak from her cancer…

So, hasta luego, Susie!, from all your friends. I know we shared the same attitude about our Catholic education and our first names heavily laden with religious meaning, but I’d still like to believe that your rich and contagious laugh now resonates throughout the cosmos. So, hasta luego, Susie, and un abrazo to all of us missing her.

recycling accept-reject rejections

Posted in Statistics, University life with tags , , , , , , , , , on July 1, 2014 by xi'an

Vinayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (aka likelihood) satisfies

p(x|\theta) \propto f(x|\theta) \le q(x|\theta) M\,,

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

\prod_{i=1}^n \dfrac{f(x_i|\theta)}{M} \prod_{j=1}^{m_i} \left\{q(y_{ij}|\theta) -\dfrac{f(y_{ij}|\theta)}{M}\right\}

A closed-form, if not necessarily congenial, function.

Now this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

It is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006)  The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

  1. in closed form;
  2. with a manageable and tight enough “constant” M;
  3. compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

The paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of rejection is bounded from below, i.e. calling for a less efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of acceptance is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, Statistical Science).

Note also that, despite the opposition raised by the authors, the method per se does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

Maybe some further experimental evidence tomorrow…

reading classics (#9,10)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on January 28, 2014 by xi'an

La Défense from Paris-Dauphine, Nov. 15, 2012Today was the very last session of our Reading Classics Seminar for the academic year 2013-2014. We listened two presentations, one on the Casella and Strawderman (1984) paper on the estimation of the normal bounded mean. And one on the Hartigan and Wong’s 1979 K-Means Clustering Algorithm paper in JRSS C. The first presentation did not go well as my student had difficulties with the maths behind the paper. (As he did not come to ask me or others for help, it may well be that he put this talk together at the last minute, at a time busy with finals and project deliveries. He also failed to exploit those earlier presentations of the paper.) The innovative part in the talk was the presentation of several R simulations comparing the risk of the minimax Bayes estimator with the one for the MLE. Although the choice of simulating different samples of standard normals for different values of the parameters and even for both estimators made the curves (unnecessarily) all wiggly.

By contrast, the second presentation was very well-designed, with great Beamer slides, interactive features and a software oriented focus. My student Mouna Berrada started from the existing R function kmeans to explain the principles of the algorithm, recycling the interactive presentation of last year as well (with my permission), and creating a dynamic flowchart that was most helpful. So she made the best of this very short paper! Just (predictably) missing the question of the statistical model behind the procedure. During the discussion, I mused why k-medians clustering was not more popular as it offered higher robustness guarantees, albeit further away from a genuine statistical model. And why k-means clustering was not more systematically compared with mixture (EM) estimation.

Here are the slides for the second talk


Get every new post delivered to your Inbox.

Join 773 other followers