It seems that every Sunday I run in Central Park, I am doomed to hit a race! This time it was not the NYC half-marathon (and I did not see Paula Radcliffe as she was in Berlin) but an 18 miles race in preparation for the NYC marathon. I had completed my fartlek training of 6x4mn and was recovering from a anaerobic last round when I saw some runners coming, so went with them as a recuperation jog for a mile or so. They had done the first 4 miles in 27’28”, which corresponds to a 4’16” pace per kilometer, so I must have missed the top runners. Actually, I think the first runners were at least 4 minutes faster, as they were coming when I left for the last 4mn. (But it was good for recovery!) Checking on the webpage of the race, the winner finished in 1:37’45”, which gives a marathon time of 2:21’40” unless I am confused.
Archive for September, 2011
As I was having my last training session at 6:45 this morning around the local park, I crossed a runner and said my customary “Bonjour !” (which is rarely answered, by the way), getting a surprising “Salut Christian !” answer as the other runner was Aurélien Garivier, also running his lap in the dark but with a better eyesight than mine! So we had a nice chat while doing a half-lap together (a wee faster than I planned!). Next run is Argentan on Saturday!
The current question for the ISBA Bulletin is “What is the biggest and most surprising change in the field of Statistics that you have witnessed, and what do you think will be the next one?” The answer to the second part is easy: I do not know and even if I knew I would be writing papers about it rather than spilling the beans… The answer to the first part is anything but easy. At the most literal level, taking “witnessed” at face value, I have witnessed the “birth” of Markov chain Monte Carlo methods at the conference organised in Sherbrooke by Jean-Francois Angers in June 1989… (This was already reported in our Short history of MCMC with George Casella.) I clearly remember Adrian showing the audience a slide with about ten lines of Fortran code that corresponded to the Gibbs sampler for a Bayesian analysis of a mixed effect linear model (later to be analysed in JASA). This was so shockingly simple… It certainly was the talk that had the most impact on my whole career, even though (a) I would have certainly learned about MCMC quickly enough had I missed the Sherbrooke conference and (b) there were other talks in my academic life that also induced that “wow” moment, for sure. At a less literal level, the biggest chance if not the most surprising is that the field has become huge, multifaceted, and ubiquitous. When I started studying statistics, it was certainly far from being the sexiest possible field! (At least in the general public) And the job offers were not as numerous and diverse as they are today. (The same is true for Bayesian statistics, of course. Even though it has sounded sexy from the start!)
Pierre Pudlo and I worked this morning on a distribution related to
philogenic philogenetic trees and got stuck on the following Bessel integral
where In is the modified Bessel function of the first kind. We could not find better than formula 6.611(4) in Gradshteyn and Ryzhik. which is for a=0… Anyone in for a closed form formula, even involving special functions?
“‘Frequentist methods achieve an objective connection to hypotheses about the data-generating process by being constrained and calibrated by the method’s error probabilities in relation to these models .”—D. Cox and D. Mayo, p.277, Error and Inference, 2010
The second part of the seventh chapter of Error and Inference, is David Cox’s and Deborah Mayo’s “Objectivity and conditionality in frequentist inference“. (Part of the section is available on Google books.) The purpose is clear and the chapter quite readable from a statistician’s perspective. I however find it difficult to quantify objectivity by first conditioning on “a statistical model postulated to have generated data”, as again this assumes the existence of a “true” probability model where “probabilities (…) are equal or close to the actual relative frequencies”. As earlier stressed by Andrew:
“I don’t think it’s helpful to speak of “objective priors.” As a scientist, I try to be objective as much as possible, but I think the objectivity comes in the principle, not the prior itself. A prior distribution–any statistical model–reflects information, and the appropriate objective procedure will depend on what information you have.”
The paper opposes the likelihood, Bayesian, and frequentist methods, reproducing what Gigerenzer called the “superego, the ego, and the id” in his paper on statistical significance. Cox and Mayo stress from the start that the frequentist approach is (more) objective because it is based on the sampling distribution of the test. My primary problem with this thesis is that the “hypothetical long run” (p.282) does not hold in realistic settings. Even in the event of a reproduction of similar or identical tests, a sequential procedure exploiting everything that has been observed so far is more efficient than the mere replication of the same procedure solely based on the current observation.
“Virtually all (…) models are to some extent provisional, which is precisely what is expected in the building up of knowledge.”—D. Cox and D. Mayo, p.283, Error and Inference, 2010
The above quote is something I completely agree with, being another phrasing of George Box’s “all models are wrong”, but this transience of working models is a good reason in my opinion to account for the possibility of alternative working models from the start of the statistical analysis. Hence for an inclusion of those models in the statistical analysis equally from the start. Which leads almost inevitably to a Bayesian formulation of the testing problem.
“‘Perhaps the confusion [over the role of sufficient statistics] stems in part because the various inference schools accept the broad, but not the detailed, implications of sufficiency.”—D. Cox and D. Mayo, p.286, Error and Inference, 2010
The discussion over the sufficiency principle is interesting, as always. The authors propose to solve the confusion between the sufficiency principle and the frequentist approach by assuming that inference “is relative to the particular experiment, the type of inference, and the overall statistical approach” (p.287). This creates a barrier between sampling distributions that avoids the binomial versus negative binomial paradox always stressed in the Bayesian literature. But the solution is somehow tautological: by conditioning on the sampling distribution, it avoids the difficulties linked with several sampling distributions all producing the same likelihood. After my recent work on ABC model choice, I am however less excited about the sufficiency principle as the existence of [non-trivial] sufficient statistics is quite the rare event. Especially across models. The section (pp. 288-289) is also revealing about the above “objectivity” of the frequentist approach in that the derivation of a test taking large value away from the null with a well-known distribution under the null is not an automated process, esp. when nuisance parameters cannot be escaped from (pp. 291-294). Achieving separation from nuisance parameters, i.e. finding statistics that can be conditioned upon to eliminate those nuisance parameters, does not seem feasible outside well-formalised models related with exponential families. Even in such formalised models, a (clear?) element of arbitrariness is involved in the construction of the separations, which implies that the objectivity is under clear threat. The chapter recognises this limitation in Section 9.2 (pp.293-294), however it argues that separation is much more common in the asymptotic sense and opposes the approach to the Bayesian averaging over the nuisance parameters, which “may be vitiated by faulty priors” (p.294). I am not convinced by the argument, given that the (approximate) condition approach amount to replace the unknown nuisance parameter by an estimator, without accounting for the variability of this estimator. Averaging brings the right (in a consistency sense) penalty.
A compelling section is the one about the weak conditionality principle (pp. 294-298), as it objects to the usual statement that a frequency approach breaks this principle. In a mixture experiment about the same parameter θ, inferences made conditional on the experiment “are appropriately drawn in terms of the sampling behavior in the experiment known to have been performed” (p. 296). This seems hardly objectionable, as stated. And I must confess the sin of stating the opposite as The Bayesian Choice has this remark (Example 1.3.7, p.18) that the classical confidence interval averages over the experiments… Mea culpa! The term experiment validates the above conditioning in that several experiments could be used to measure θ, each with a different p-value. I will not argue with this. I could however argue about “conditioning is warranted to achieve objective frequentist goals” (p. 298) in that the choice of the conditioning, among other things, weakens the objectivity of the analysis. In a sense the above pirouette out of the conditioning principle paradox suffers from the same weakness, namely that when two distributions characterise the same data (the mixture and the conditional distributions), there is a choice to be made between “good” and “bad”. Nonetheless, an approach based on the mixture remains frequentist if non-optimal… (The chapter later attacks the derivation of the likelihood principle, I will come back to it in a later post.)
“‘Many seem to regard reference Bayesian theory to be a resting point until satisfactory subjective or informative priors are available. It is hard to see how this gives strong support to the reference prior research program.”—D. Cox and D. Mayo, p.302, Error and Inference, 2010
A section also worth commenting is (unsurprisingly!) the one addressing the limitations of the Bayesian alternatives (pp. 298–302). It however dismisses right away the personalistic approach to priors by (predictably if hastily) considering it fails the objectivity canons. This seems a wee quick to me, as the choice of a prior is (a) the choice of a reference probability measure against which to assess the information brought by the data, not clearly less objective than picking one frequentist estimator or another, and (b) a personal construction of the prior can also be defended on objective grounds, based on the past experience of the modeler. That it varies from one modeler to the next is not an indication of subjectivity per se, simply of different past experiences. Cox and Mayo then focus on reference priors, à la Bernardo-Berger, once again pointing out the lack of uniqueness of those priors as a major flaw. While the sub-chapter agrees on the understanding of those priors as convention or reference priors, aiming at maximising the input from the data, it gets stuck on the impropriety of such priors: “if priors are not probabilities, what then is the interpretation of a posterior?” (p.299). This seems like a strange comment to me: the interpretation of a posterior is that it is a probability distribution and this is the only mathematical constraint one has to impose on a prior. (Which may be a problem in the derivation of reference priors.) As detailed in The Bayesian Choice among other books, there are many compelling reasons to invite improper priors into the game. (And one not to, namely the difficulty with point null hypotheses.) While I agree that the fact that some reference priors (like matching priors, whose discussion p. 302 escapes me) have good frequentist properties is not compelling within a Bayesian framework, it seems a good enough answer to the more general criticism about the lack of objectivity: in that sense, frequency-validated reference priors are part of the huge package of frequentist procedures and cannot be dismissed on the basis of being Bayesian. That reference priors are possibly at odd with the likelihood principle does not matter very much: the shape of the sampling distribution is part of the prior information, not of the likelihood per se. The final argument (Section 12) that Bayesian model choice requires the preliminary derivation of “the possible departures that might arise” (p.302) has been made at several points in Error and Inference. Besides being in my opinion a valid working principle, i.e. selecting the most appropriate albeit false model, this definition of well-defined alternatives is mimicked by the assumption of “statistics whose distribution does not depend on the model assumption” (p. 302) found in the same last paragraph.
In conclusion this (sub-)chapter by David Cox and Deborah Mayo is (as could be expected!) a deep and thorough treatment of the frequentist approach to the sufficiency and (weak) conditionality principle. It however fails to convince me that there exists a “unique and unambiguous” frequentist approach to all but the most simple problems. At least, from reading this chapter, I cannot find a working principle that would lead me to this single unambiguous frequentist procedure.
My daughter’s school was blocked this morning, as the result of a day of actions against the persistent and long-term policy of position cuts and hour reductions in the Education budget by the current government of France. The impact on higher education of those cuts is already perceptible in the math background of our first-year students… Not that a few garbage bins piled in front of the high school entrance could make a dent in our governmental policy. Nor on the absurd statistics of the Education minister, Luc Chatel, who goes back to 1980 as a reference year to show “increases and improvements”!
Although this was only a half-day of talks, the third day of the workshop was equally thought-challenging and diverse. (I managed to miss the ten first minutes by taking a Line 3 train to 125th street, having overlooked the earlier split from Line 1… Crossing south Harlem on a Sunday morning is a fairly mild experience though.) Jean-Marc Azaïs gave a personal recollection on the work of Mario Wschebor, who passed away a week ago and should have attended the workshop. Nan Chen talked about the Monte Carlo approximation of quantities of the form
which is a problem when f is non linear. This reminded me (and others) of the Bernoulli factory and of the similar trick we use in the vanilla Rao-Blackwellisation paper with Randal Douc. However, the approach was different in that the authors relied on a nested simulation algorithm that did not adapt against f. And did not account for the shape of f. Peter Glynn, while also involved in the above, delivered a talk on the initial transient that showed possibilities for MCMC convergence assessment (even though this is a much less active area than earlier). And, as a fitting conclusion, the conference organiser, Jingchen Liu gave a talk on non-compatible conditionals he and Andrew are using to handle massively-missing datasets. It reminded me of Hobert and Casella (1996, JASA) of course and also of discussions we had in Paris with Andrew and Nicolas. Looking forward to the paper (as I have missed some points about the difference between true and operational models)! Overall, this was thus a terrific workshop (I just wish I could have been able to sleep one hour more each night to be more alert during all talks!) and a fantastic if intense schedule fitting the start of the semester and of teaching (even though Robin had to teach my first R class in English on Friday). I also discovered that several of the participants were attending the Winter Simulation Conference later this year, hence another opportunity to discuss simulation strategies together.