Archive for George Casella

recycling accept-reject rejections

Posted in Statistics, University life with tags , , , , , , , , , on July 1, 2014 by xi'an

Vinayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (aka likelihood) satisfies

p(x|\theta) \propto f(x|\theta) \le q(x|\theta) M\,,

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

\prod_{i=1}^n \dfrac{f(x_i|\theta)}{M} \prod_{j=1}^{m_i} \left\{q(y_{ij}|\theta) -\dfrac{f(y_{ij}|\theta)}{M}\right\}

A closed-form, if not necessarily congenial, function.

Now this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

It is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006)  The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

  1. in closed form;
  2. with a manageable and tight enough “constant” M;
  3. compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

The paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of rejection is bounded from below, i.e. calling for a less efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of acceptance is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, Statistical Science).

Note also that, despite the opposition raised by the authors, the method per se does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

Maybe some further experimental evidence tomorrow…

reading classics (#9,10)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on January 28, 2014 by xi'an

La Défense from Paris-Dauphine, Nov. 15, 2012Today was the very last session of our Reading Classics Seminar for the academic year 2013-2014. We listened two presentations, one on the Casella and Strawderman (1984) paper on the estimation of the normal bounded mean. And one on the Hartigan and Wong’s 1979 K-Means Clustering Algorithm paper in JRSS C. The first presentation did not go well as my student had difficulties with the maths behind the paper. (As he did not come to ask me or others for help, it may well be that he put this talk together at the last minute, at a time busy with finals and project deliveries. He also failed to exploit those earlier presentations of the paper.) The innovative part in the talk was the presentation of several R simulations comparing the risk of the minimax Bayes estimator with the one for the MLE. Although the choice of simulating different samples of standard normals for different values of the parameters and even for both estimators made the curves (unnecessarily) all wiggly.

By contrast, the second presentation was very well-designed, with great Beamer slides, interactive features and a software oriented focus. My student Mouna Berrada started from the existing R function kmeans to explain the principles of the algorithm, recycling the interactive presentation of last year as well (with my permission), and creating a dynamic flowchart that was most helpful. So she made the best of this very short paper! Just (predictably) missing the question of the statistical model behind the procedure. During the discussion, I mused why k-medians clustering was not more popular as it offered higher robustness guarantees, albeit further away from a genuine statistical model. And why k-means clustering was not more systematically compared with mixture (EM) estimation.

Here are the slides for the second talk

from statistical evidence to evidence of causality

Posted in Books, Statistics with tags , , , , , , , , , on December 24, 2013 by xi'an

I took the opportunity of having to wait at a local administration a long while today (!) to read an arXived paper by Dawid, Musio and Fienberg on the−both philosophical and practical−difficulty to establish the probabilities of the causes of effects. The first interesting thing about the paper is that it relates to the Médiator drug scandal that took place in France in the past year and still is under trial: thanks to the investigations of a local doctor, Irène Frachon, the drug was exposed as an aggravating factor for heart disease. Or maybe the cause. The case-control study of Frachon summarises into a 2×2 table with a corrected odds ratio of 17.1. From there, the authors expose the difficulties of drawing inference about causes of effects, i.e. causality, an aspect of inference that has always puzzled me. (And the paper led me to search for the distinction between odds ratio and risk ratio.)

“And the conceptual and implementational difficulties that we discuss below, that beset even the simplest case of inference about causes of effects, will be hugely magnified when we wish to take additional account of such policy considerations.”

A third interesting notion in the paper is the inclusion of counterfactuals. My introduction to counterfactuals dates back to a run in the back-country roads around Ithaca, New York, when George told me about a discussion paper from Phil he was editing for JASA on that notion with his philosopher neighbour Steven Schwartz as a discussant. (It was a great run, presumably in the late Spring. And the best introduction I could dream of!) Now, the paper starts from the counterfactual perspective to conclude that inference is close to impossible in this setting. Within my limited understanding, I would see that as a drawback of using counterfactuals, rather than of drawing inference about causes. If the corresponding statistical model is nonindentifiable, because one of the two responses is always missing, the model seems inappropriate. I am also surprised at the notion of “sufficiency” used in the paper, since it sounds like the background information cancels the need to account for the treatment (e.g., aspirin) decision.  The fourth point is the derivation of bounds on the probabilities of causation, despite everything! Quite an interesting read thus!

ciao, George!

Posted in pictures, Running, Travel, University life with tags , , , , on September 7, 2013 by xi'an



Posted in Books, R, Statistics with tags , , on May 14, 2013 by xi'an

Here is the cover of the Japanese translation of our Introducing Monte Carlo methods with R book.  A few year after the French translation. It actually appeared last year in August but I was not informed of this till a few weeks ago. The publisher is Maruzen, with an associated webpage if you want to order… Unless I am confused the translators are Hiro Ishida and Kazue Ishida; they deserve a major ありがとう ! And too bad George is no longer with us: this must have been the first translation of one of his books in Japanese..


Get every new post delivered to your Inbox.

Join 604 other followers