Archive for frequentist inference

the philosophical importance of Stein’s paradox

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on November 30, 2015 by xi'an

I recently came across this paper written by three philosophers of Science, attempting to set the Stein paradox in a philosophical light. Given my past involvement, I was obviously interested about which new perspective could be proposed, close to sixty years after Stein (1956). Paper that we should actually celebrate next year! However, when reading the document, I did not find a significantly innovative approach to the phenomenon…

The paper does not start in the best possible light since it seems to justify the use of a sample mean through maximum likelihood estimation, which only is the case for a limited number of probability distributions (including the Normal distribution, which may be an implicit assumption). For instance, when the data is Student’s t, the MLE is not the sample mean, no matter how shocking that might sounds! (And while this is a minor issue, results about the Stein effect taking place in non-normal settings appear much earlier than 1998. And earlier than in my dissertation. See, e.g., Berger and Bock (1975). Or in Brandwein and Strawderman (1978).)

While the linear regression explanation for the Stein effect is already exposed in Steve Stigler’s Neyman Lecture, I still have difficulties with the argument in that for instance we do not know the value of the parameter, which makes the regression and the inverse regression of parameter means over Gaussian observations mere concepts and nothing practical. (Except for the interesting result that two observations make both regressions coincide.) And it does not seem at all intuitive (to me) that imposing a constraint should improve the efficiency of a maximisation program… Continue reading

how individualistic should statistics be?

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , on November 5, 2015 by xi'an

keep-stats-subjectiveKeli Liu and Xiao-Li Meng completed a paper on the very nature of inference, to appear in The Annual Review of Statistics and Its Application. This paper or chapter is addressing a fundamental (and foundational) question on drawing inference based a sample on a new observation. That is, in making prediction. To what extent should the characteristics of the sample used for that prediction resemble those of the future observation? In his 1921 book, A Treatise on Probability, Keynes thought this similarity (or individualisation) should be pushed to its extreme, which led him to somewhat conclude on the impossibility of statistics and never to return to the field again. Certainly missing the incoming possibility of comparing models and selecting variables. And not building so much on the “all models are wrong” tenet. On the contrary, classical statistics use the entire data available and the associated model to run the prediction, including Bayesian statistics, although it is less clear how to distinguish between data and control there. Liu & Meng debate about the possibility of creating controls from the data alone. Or “alone” as the model behind always plays a capital role.

“Bayes and Frequentism are two ends of the same spectrum—a spectrum defined in terms of relevance and robustness. The nominal contrast between them (…) is a red herring.”

viemortrerbThe paper makes for an exhilarating if definitely challenging read. With a highly witty writing style. If only because the perspective is unusual, to say the least!, and requires constant mental contortions to frame the assertions into more traditional terms.  For instance, I first thought that Bayesian procedures were in agreement with the ultimate conditioning approach, since it conditions on the observables and nothing else (except for the model!). Upon reflection, I am not so convinced that there is such a difference with the frequentist approach in the (specific) sense that they both take advantage of the entire dataset. Either from the predictive or from the plug-in distribution. It all boils down to how one defines “control”.

“Probability and randomness, so tightly yoked in our minds, are in fact distinct concepts (…) at the end of the day, probability is essentially a tool for bookkeeping, just like the abacus.”

Some sentences from the paper made me think of ABC, even though I am not trying to bring everything back to ABC!, as drawing controls is the nature of the ABC game. ABC draws samples or control from the prior predictive and only keeps those for which the relevant aspects (or the summary statistics) agree with those of the observed data. Which opens similar questions about the validity and precision of the resulting inference, as well as the loss of information due to the projection over the summary statistics. While ABC is not mentioned in the paper, it can be used as a benchmark to walk through it.

“In the words of Jack Kiefer, we need to distinguish those problems with `luck data’ from those with `unlucky data’.”

keep-calm-and-condi-tionI liked very much recalling discussions we had with George Casella and Costas Goutis in Cornell about frequentist conditional inference, with the memory of Jack Kiefer still lingering around. However, I am not so excited about the processing of models here since, from what I understand in the paper (!), the probabilistic model behind the statistical analysis must be used to some extent in producing the control case and thus cannot be truly assessed with a critical eye. For instance, of which use is the mean square error when the model behind is unable to produce the observed data? In particular, the variability of this mean squared error is directly driven by this model. Similarly the notion of ancillaries is completely model-dependent. In the classification diagrams opposing robustness to relevance, all methods included therein are parametric. While non-parametric types of inference could provide a reference or a calibration ruler, at the very least.

Also, by continuously and maybe a wee bit heavily referring to the doctor-and-patient analogy, the paper is somewhat confusing as to which parts are analogy and which parts are methodology and to which type of statistical problem is covered by the discussion (sometimes it feels like all problems and sometimes like medical trials).

“The need to deliver individualized assessments of uncertainty are more pressing than ever.”

 A final question leads us to an infinite regress: if the statistician needs to turn to individualized inference, at which level of individuality should the statistician be assessed? And who is going to provide the controls then? In any case, this challenging paper is definitely worth reading by (only mature?) statisticians to ponder about the nature of the game!

beyond subjective and objective in Statistics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on August 28, 2015 by xi'an

“At the level of discourse, we would like to move beyond a subjective vs. objective shouting match.” (p.30)

This paper by Andrew Gelman and Christian Hennig calls for the abandonment of the terms objective and subjective in (not solely Bayesian) statistics. And argue that there is more than mere prior information and data to the construction of a statistical analysis. The paper is articulated as the authors’ proposal, followed by four application examples, then a survey of the philosophy of science perspectives on objectivity and subjectivity in statistics and other sciences, next to a study of the subjective and objective aspects of the mainstream statistical streams, concluding with a discussion on the implementation of the proposed move. Continue reading

Bureau international des poids et mesures [bayésiennes?]

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on June 19, 2015 by xi'an

The workshop at the BIPM on measurement uncertainty was certainly most exciting, first by its location in the Parc de Saint Cloud in classical buildings overlooking the Seine river in a most bucolic manner…and second by its mostly Bayesian flavour. The recommendations that the workshop addressed are about revisions in the current GUM, which stands for the Guide to the Expression of Uncertainty in Measurement. The discussion centred on using a more Bayesian approach than in the earlier version, with the organisers of the workshop and leaders of the revision apparently most in favour of that move. “Knowledge-based pdfs” came into the discussion as an attractive notion since it rings a Bayesian bell, especially when associated with probability as a degree of belief and incorporating the notion of an a priori probability distribution. And propagation of errors. Or even more when mentioning the removal of frequentist validations. What I gathered from the talks is the perspective drifting away from central limit approximations to more realistic representations, calling for Monte Carlo computations. There is also a lot I did not get about conventions, codes and standards. Including a short debate about the different meanings on Monte Carlo, from simulation technique to calculation method (as for confidence intervals). And another discussion about replacing the old formula for estimating sd from the Normal to the Student’s t case. A change that remains highly debatable since the Student’s t assumption is as shaky as the Normal one. What became clear [to me] during the meeting is that a rather heated debate is currently taking place about the need for a revision, with some members of the six (?) organisations involved arguing against Bayesian or linearisation tools.

This became even clearer during our frequentist versus Bayesian session with a first talk so outrageously anti-Bayesian it was hilarious! Among other things, the notion that “fixing” the data was against the principles of physics (the speaker was a physicist), that the only randomness in a Bayesian coin tossing was coming from the prior, that the likelihood function was a subjective construct, that the definition of the posterior density was a generalisation of Bayes’ theorem [generalisation found in… Bayes’ 1763 paper then!], that objective Bayes methods were inconsistent [because Jeffreys’ prior produces an inadmissible estimator of μ²!], that the move to Bayesian principles in GUM would cost the New Zealand economy 5 billion dollars [hopefully a frequentist estimate!], &tc., &tc. The second pro-frequentist speaker was by comparison much much more reasonable, although he insisted on showing Bayesian credible intervals do not achieve a nominal frequentist coverage, using a sort of fiducial argument distinguishing x=X+ε from X=x+ε that I missed… A lack of achievement that is fine by my standards. Indeed, a frequentist confidence interval provides a coverage guarantee either for a fixed parameter (in which case the Bayesian approach achieves better coverage by constant updating) or a varying parameter (in which case the frequency of proper inclusion is of no real interest!). The first Bayesian speaker was Tony O’Hagan, who summarily shred the first talk to shreds. And also criticised GUM2 for using reference priors and maxent priors. I am afraid my talk was a bit too exploratory for the audience (since I got absolutely no question!) In retrospect, I should have given an into to reference priors.

An interesting specificity of a workshop on metrology and measurement is that they are hard stickers to schedule, starting and finishing right on time. When a talk finished early, we waited until the intended time to the next talk. Not even allowing for extra discussion. When the only overtime and Belgian speaker ran close to 10 minutes late, I was afraid he would (deservedly) get lynched! He escaped unscathed, but may (and should) not get invited again..!

another view on Jeffreys-Lindley paradox

Posted in Books, Statistics, University life with tags , , , , , on January 15, 2015 by xi'an

I found another paper on the Jeffreys-Lindley paradox. Entitled “A Misleading Intuition and the Bayesian Blind Spot: Revisiting the Jeffreys-Lindley’s Paradox”. Written by Guillaume Rochefort-Maranda, from Université Laval, Québec.

This paper starts by assuming an unbiased estimator of the parameter of interest θ and under test for the null θ=θ0. (Which makes we wonder at the reason for imposing unbiasedness.) Another highly innovative (or puzzling)  aspect is that the Lindley-Jeffreys paradox presented therein is described without any Bayesian input. The paper stands “within a frequentist (classical) framework”: it actually starts with a confidence-interval-on-θ-vs.-test argument to argue that, with a fixed coverage interval that excludes the null value θ0, the estimate of θ may converge to θ0 without ever accepting the null θ=θ0. That is, without the confidence interval ever containing θ0. (Although this is an event whose probability converges to zero.) Bayesian aspects come later in the paper, even though the application to a point null versus a point null test is of little interest since a Bayes factor is then a likelihood ratio.

As I explained several times, including in my Philosophy of Science paper, I see the Lindley-Jeffreys paradox as being primarily a Bayesiano-Bayesian issue. So just the opposite of the perspective taken by the paper. That frequentist solutions differ does not strike me as paradoxical. Now, the construction of a sequence of samples such that all partial samples in the sequence exclude the null θ=θ0 is not a likely event, so I do not see this as a paradox even or especially when putting on my frequentist glasses: if the null θ=θ0 is true, this cannot happen in a consistent manner, even though a single occurrence of a p-value less than .05 is highly likely within such a sequence.

Unsurprisingly, the paper relates to the three most recent papers published by Philosophy of Science, discussing first and foremost Spanos‘ view. When the current author introduces Mayo and Spanos’ severity, i.e. the probability to exceed the observed test statistic under the alternative, he does not define this test statistic d(X), which makes the whole notion incomprehensible to a reader not already familiar with it. (And even for one familiar with it…)

“Hence, the solution I propose (…) avoids one of [Freeman’s] major disadvantages. I suggest that we should decrease the size of tests to the extent where it makes practically no difference to the power of the test in order to improve the likelihood ratio of a significant result.” (p.11)

One interesting if again unsurprising point in the paper is that one reason for the paradox stands in keeping the significance level constant as the sample size increases. While it is possible to decrease the significance level and to increase the power simultaneously. However, the solution proposed above does not sound rigorous hence I fail to understand how low the significance has to be for the method to stop/work. I cannot fathom a corresponding algorithmic derivation of the author’s proposal.

“I argue against the intuitive idea that a significant result given by a very powerful test is less convincing than a significant result given by a less powerful test.”

The criticism on the “blind spot” of the Bayesian approach is supported by an example where the data is issued from a distribution other than either of the two tested distributions. It seems reasonable that the Bayesian answer fails to provide a proper answer in this case. Even though it illustrates the difficulty with the long-term impact of the prior(s) in the Bayes factor and (in my opinion) the need to move away from this solution within the Bayesian paradigm.

Bayes’ Theorem in the 21st Century, really?!

Posted in Books, Statistics with tags , , , , , , on June 20, 2013 by xi'an

“In place of past experience, frequentism considers future behavior: an optimal estimator is one that performs best in hypothetical repetitions of the current experiment. The resulting gain in scientific objectivity has carried the day…”

Julien Cornebise sent me this Science column by Brad Efron about Bayes’ theorem. I am a tad surprised that it got published in the journal, given that it does not really contain any new item of information. However, being unfamiliar with Science, it may also be that it also publishes major scientists’ opinions or warnings, a label that can fit this column in Science. (It is quite a proper coincidence that the post appears during Bayes 250.)

Efron’s piece centres upon the use of objective Bayes approaches in Bayesian statistics, for which Laplace was “the prime violator”. He argues through examples that noninformative “Bayesian calculations cannot be uncritically accepted, and should be checked by other methods, which usually means “frequentistically”. First, having to write “frequentistically” once is already more than I can stand! Second, using the Bayesian framework to build frequentist procedures is like buying top technical outdoor gear to climb the stairs at the Sacré-Coeur on Butte Montmartre! The naïve reader is then left clueless as to why one should use a Bayesian approach in the first place. And perfectly confused about the meaning of objectivity. Esp. given the above quote! I find it rather surprising that this old saw of a  claim of frequentism to objectivity resurfaces there. There is an infinite range of frequentist procedures and, while some are more optimal than others, none is “the” optimal one (except for the most baked-out examples like say the estimation of the mean of a normal observation).

“A Bayesian FDA (there isn’t one) would be more forgiving. The Bayesian posterior probability of drug A’s superiority depends only on its final evaluation, not whether there might have been earlier decisions.”

The second criticism of Bayesianism therein is the counter-intuitive irrelevance of stopping rules. Once again, the presentation is fairly biased, because a Bayesian approach opposes scenarii rather than evaluates the likelihood of a tail event under the null and only the null. And also because, as shown by Jim Berger and co-authors, the Bayesian approach is generally much more favorable to the null than the p-value.

“Bayes’ Theorem is an algorithm for combining prior experience with current evidence. Followers of Nate Silver’s FiveThirtyEight column got to see it in spectacular form during the presidential campaign: the algorithm updated prior poll results with new data on a daily basis, nailing the actual vote in all 50 states.”

It is only fair that Nate Silver’s book and column are mentioned in Efron’s column. Because it is a highly valuable and definitely convincing illustration of Bayesian principles. What I object to is the criticism “that most cutting-edge science doesn’t enjoy FiveThirtyEight-level background information”. In my understanding, the poll model of FiveThirtyEight built up in a sequential manner a weight system over the different polling companies, hence learning from the data if in a Bayesian manner about their reliability (rather than forgetting the past). This is actually what caused Larry Wasserman to consider that Silver’s approach was actually more frequentist than Bayesian…

“Empirical Bayes is an exciting new statistical idea, well-suited to modern scientific technology, saying that experiments involving large numbers of parallel situations carry within them their own prior distribution.”

My last point of contention is about the (unsurprising) defence of the empirical Bayes approach in the Science column. Once again, the presentation is biased towards frequentism: in the FDR gene example, the empirical Bayes procedure is motivated by being the frequentist solution. The logical contradiction in “estimat[ing] the relevant prior from the data itself” is not discussed and the conclusion that Brad Efron uses “empirical Bayes methods in the parallel case [in the absence of prior information”, seemingly without being cautious and “uncritically”, does not strike me as the proper last argument in the matter! Nor does it give a 21st Century vision of what nouveau Bayesianism should be, faced with the challenges of Big Data and the like…

the anti-Bayesian moment and its passing commented

Posted in Books, Statistics, University life with tags , , , , on March 12, 2013 by xi'an

Here is a comment on our rejoinder “the anti-Bayesian moment and its passing” with Andrew Gelman from Deborah Mayo, comment that could not make it through as a comment:

You assume that I am interested in long-term average properties of procedures, even though I have so often argued that they are at most necessary (as consequences of good procedures), but scarcely sufficient for a severity assessment. The error statistical account I have developed is a statistical philosophy. It is not one to be found in Neyman and Pearson, jointly or separately, except in occasional glimpses here and there (unfortunately). It is certainly not about well-defined accept-reject rules. If N-P had only been clearer, and Fisher better behaved, we would not have had decades of wrangling. However, I have argued, the error statistical philosophy explicates, and directs the interpretation of, frequentist sampling theory methods in scientific, as opposed to behavioural, contexts. It is not a complete philosophy…but I think Gelmanian Bayesians could find in it a source of “standard setting”.

You say “the prior is both a probabilistic object, standard from this perspective, and a subjective construct, translating qualitative personal assessments into a probability distribution. The extension of this dual nature to the so-called “conventional” priors (a very good semantic finding!) is to set a reference … against which to test the impact of one’s prior choices and the variability of the resulting inference. …they simply set a standard against which to gauge our answers.”

I think there are standards for even an approximate meaning of “standard-setting” in science, and I still do not see how an object whose meaning and rationale may fluctuate wildly, even in a given example, can serve as a standard or reference. For what?

Perhaps the idea is that one can gauge how different priors change the posteriors, because, after all, the likelihood is well-defined. That is why the prior and not the likelihood is the camel. But it isn’t obvious why I should want the camel. (camel/gnat references in the paper and response).


Get every new post delivered to your Inbox.

Join 946 other followers