Archive for frequentist inference

another view on Jeffreys-Lindley paradox

Posted in Books, Statistics, University life with tags , , , , , on January 15, 2015 by xi'an

I found another paper on the Jeffreys-Lindley paradox. Entitled “A Misleading Intuition and the Bayesian Blind Spot: Revisiting the Jeffreys-Lindley’s Paradox”. Written by Guillaume Rochefort-Maranda, from Université Laval, Québec.

This paper starts by assuming an unbiased estimator of the parameter of interest θ and under test for the null θ=θ0. (Which makes we wonder at the reason for imposing unbiasedness.) Another highly innovative (or puzzling)  aspect is that the Lindley-Jeffreys paradox presented therein is described without any Bayesian input. The paper stands “within a frequentist (classical) framework”: it actually starts with a confidence-interval-on-θ-vs.-test argument to argue that, with a fixed coverage interval that excludes the null value θ0, the estimate of θ may converge to θ0 without ever accepting the null θ=θ0. That is, without the confidence interval ever containing θ0. (Although this is an event whose probability converges to zero.) Bayesian aspects come later in the paper, even though the application to a point null versus a point null test is of little interest since a Bayes factor is then a likelihood ratio.

As I explained several times, including in my Philosophy of Science paper, I see the Lindley-Jeffreys paradox as being primarily a Bayesiano-Bayesian issue. So just the opposite of the perspective taken by the paper. That frequentist solutions differ does not strike me as paradoxical. Now, the construction of a sequence of samples such that all partial samples in the sequence exclude the null θ=θ0 is not a likely event, so I do not see this as a paradox even or especially when putting on my frequentist glasses: if the null θ=θ0 is true, this cannot happen in a consistent manner, even though a single occurrence of a p-value less than .05 is highly likely within such a sequence.

Unsurprisingly, the paper relates to the three most recent papers published by Philosophy of Science, discussing first and foremost Spanos‘ view. When the current author introduces Mayo and Spanos’ severity, i.e. the probability to exceed the observed test statistic under the alternative, he does not define this test statistic d(X), which makes the whole notion incomprehensible to a reader not already familiar with it. (And even for one familiar with it…)

“Hence, the solution I propose (…) avoids one of [Freeman’s] major disadvantages. I suggest that we should decrease the size of tests to the extent where it makes practically no difference to the power of the test in order to improve the likelihood ratio of a significant result.” (p.11)

One interesting if again unsurprising point in the paper is that one reason for the paradox stands in keeping the significance level constant as the sample size increases. While it is possible to decrease the significance level and to increase the power simultaneously. However, the solution proposed above does not sound rigorous hence I fail to understand how low the significance has to be for the method to stop/work. I cannot fathom a corresponding algorithmic derivation of the author’s proposal.

“I argue against the intuitive idea that a significant result given by a very powerful test is less convincing than a significant result given by a less powerful test.”

The criticism on the “blind spot” of the Bayesian approach is supported by an example where the data is issued from a distribution other than either of the two tested distributions. It seems reasonable that the Bayesian answer fails to provide a proper answer in this case. Even though it illustrates the difficulty with the long-term impact of the prior(s) in the Bayes factor and (in my opinion) the need to move away from this solution within the Bayesian paradigm.

Bayes’ Theorem in the 21st Century, really?!

Posted in Books, Statistics with tags , , , , , , on June 20, 2013 by xi'an

“In place of past experience, frequentism considers future behavior: an optimal estimator is one that performs best in hypothetical repetitions of the current experiment. The resulting gain in scientific objectivity has carried the day…”

Julien Cornebise sent me this Science column by Brad Efron about Bayes’ theorem. I am a tad surprised that it got published in the journal, given that it does not really contain any new item of information. However, being unfamiliar with Science, it may also be that it also publishes major scientists’ opinions or warnings, a label that can fit this column in Science. (It is quite a proper coincidence that the post appears during Bayes 250.)

Efron’s piece centres upon the use of objective Bayes approaches in Bayesian statistics, for which Laplace was “the prime violator”. He argues through examples that noninformative “Bayesian calculations cannot be uncritically accepted, and should be checked by other methods, which usually means “frequentistically”. First, having to write “frequentistically” once is already more than I can stand! Second, using the Bayesian framework to build frequentist procedures is like buying top technical outdoor gear to climb the stairs at the Sacré-Coeur on Butte Montmartre! The naïve reader is then left clueless as to why one should use a Bayesian approach in the first place. And perfectly confused about the meaning of objectivity. Esp. given the above quote! I find it rather surprising that this old saw of a  claim of frequentism to objectivity resurfaces there. There is an infinite range of frequentist procedures and, while some are more optimal than others, none is “the” optimal one (except for the most baked-out examples like say the estimation of the mean of a normal observation).

“A Bayesian FDA (there isn’t one) would be more forgiving. The Bayesian posterior probability of drug A’s superiority depends only on its final evaluation, not whether there might have been earlier decisions.”

The second criticism of Bayesianism therein is the counter-intuitive irrelevance of stopping rules. Once again, the presentation is fairly biased, because a Bayesian approach opposes scenarii rather than evaluates the likelihood of a tail event under the null and only the null. And also because, as shown by Jim Berger and co-authors, the Bayesian approach is generally much more favorable to the null than the p-value.

“Bayes’ Theorem is an algorithm for combining prior experience with current evidence. Followers of Nate Silver’s FiveThirtyEight column got to see it in spectacular form during the presidential campaign: the algorithm updated prior poll results with new data on a daily basis, nailing the actual vote in all 50 states.”

It is only fair that Nate Silver’s book and column are mentioned in Efron’s column. Because it is a highly valuable and definitely convincing illustration of Bayesian principles. What I object to is the criticism “that most cutting-edge science doesn’t enjoy FiveThirtyEight-level background information”. In my understanding, the poll model of FiveThirtyEight built up in a sequential manner a weight system over the different polling companies, hence learning from the data if in a Bayesian manner about their reliability (rather than forgetting the past). This is actually what caused Larry Wasserman to consider that Silver’s approach was actually more frequentist than Bayesian…

“Empirical Bayes is an exciting new statistical idea, well-suited to modern scientific technology, saying that experiments involving large numbers of parallel situations carry within them their own prior distribution.”

My last point of contention is about the (unsurprising) defence of the empirical Bayes approach in the Science column. Once again, the presentation is biased towards frequentism: in the FDR gene example, the empirical Bayes procedure is motivated by being the frequentist solution. The logical contradiction in “estimat[ing] the relevant prior from the data itself” is not discussed and the conclusion that Brad Efron uses “empirical Bayes methods in the parallel case [in the absence of prior information”, seemingly without being cautious and “uncritically”, does not strike me as the proper last argument in the matter! Nor does it give a 21st Century vision of what nouveau Bayesianism should be, faced with the challenges of Big Data and the like…

the anti-Bayesian moment and its passing commented

Posted in Books, Statistics, University life with tags , , , , on March 12, 2013 by xi'an

Here is a comment on our rejoinder “the anti-Bayesian moment and its passing” with Andrew Gelman from Deborah Mayo, comment that could not make it through as a comment:

You assume that I am interested in long-term average properties of procedures, even though I have so often argued that they are at most necessary (as consequences of good procedures), but scarcely sufficient for a severity assessment. The error statistical account I have developed is a statistical philosophy. It is not one to be found in Neyman and Pearson, jointly or separately, except in occasional glimpses here and there (unfortunately). It is certainly not about well-defined accept-reject rules. If N-P had only been clearer, and Fisher better behaved, we would not have had decades of wrangling. However, I have argued, the error statistical philosophy explicates, and directs the interpretation of, frequentist sampling theory methods in scientific, as opposed to behavioural, contexts. It is not a complete philosophy…but I think Gelmanian Bayesians could find in it a source of “standard setting”.

You say “the prior is both a probabilistic object, standard from this perspective, and a subjective construct, translating qualitative personal assessments into a probability distribution. The extension of this dual nature to the so-called “conventional” priors (a very good semantic finding!) is to set a reference … against which to test the impact of one’s prior choices and the variability of the resulting inference. …they simply set a standard against which to gauge our answers.”

I think there are standards for even an approximate meaning of “standard-setting” in science, and I still do not see how an object whose meaning and rationale may fluctuate wildly, even in a given example, can serve as a standard or reference. For what?

Perhaps the idea is that one can gauge how different priors change the posteriors, because, after all, the likelihood is well-defined. That is why the prior and not the likelihood is the camel. But it isn’t obvious why I should want the camel. (camel/gnat references in the paper and response).

rise of the B word

Posted in Statistics with tags , , , on February 26, 2013 by xi'an

comparison of the uses of the words Bayesian, maximum likelihood, and frequentist, using Google NgramWhile preparing a book chapter, I checked on Google Ngram viewer the comparative uses of the words Bayesian (blue), maximum likelihood (red) and frequentist (yellow), producing the above (screen-copy quality, I am afraid!). It shows an increase of the use of the B word from the early 80’s and not the sudden rise in the 90’s I was expecting. The inclusion of “frequentist” is definitely in the joking mode, as this is not a qualification used by frequentists to describe their methods. In other words (!), “frequentist” does not occur very often in frequentist papers (and not as often as in Bayesian papers!)…

Robins and Wasserman

Posted in Statistics, Travel, University life with tags , , , , on January 17, 2013 by xi'an

entrance to Banaras Hindu University, with the huge ISBA poster, Varanasi, Jan. 10, 2013As I attended Jamie Robins’ session in Varanasi and did not have a clear enough idea of the Robbins and Wasserman paradox to discuss it viva vocce, here are my thoughts after reading Larry’s summary. My first reaction was to question whether or not this was a Bayesian statistical problem (meaning why should I be concered with the problem). Just as the normalising constant problem was not a statistical problem. We are estimating an integral given some censored realisations of a binomial depending on a covariate through an unknown function θ(x). There is not much of a parameter. However, the way Jamie presented it thru clinical trials made the problem sound definitely statistical. So end of the silly objection. My second step is to consider the very point of estimating the entire function (or infinite dimensional parameter) θ(x) when only the integral ψ is of interest. This is presumably the reason why the Bayesian approach fails as it sounds difficult to consistently estimate θ(x) under censored binomial observations, while ψ can be. Of course, if we want to estimate the probability of a success like ψ going through functional estimation this sounds like overshooting. But the Bayesian modelling of the problem appears to require considering all unknowns at once, including the function θ(x) and cannot forget about it. We encountered a somewhat similar problem with Jean-Michel Marin when working on the k-nearest neighbour classification problem. Considering all the points in the testing sample altogether as unknowns would dwarf the training sample and its information content to produce very poor inference. And so we ended up dealing with one point at a time after harsh and intense discussions! Now, back to the Robins and Wasserman paradox, I see no problem in acknowledging a classical Bayesian approach cannot produce a convergent estimate of the integral ψ. Simply because the classical Bayesian approach is an holistic system that cannot remove information to process a subset of the original problem. Call it the curse of marginalisation. Now, on a practical basis, would there be ways of running simulations of the missing Y’s when π(x) is known in order to achieve estimates of ψ? Presumably, but they would end up with a frequentist validation…

Bayes on the radio

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on November 10, 2012 by xi'an

In relation with the special issue of Science & Vie on Bayes’ formula, the French national radio (France Culture) organised a round table with Pierre Bessière, senior researcher in physiology at Collège de France, Dirk Zerwas, senior researcher in particle physics in Orsay, and Hervé Poirier, editor of Science & Vie. And myself (as I was quoted in the original paper). While I am not particularly fluent in oral debates, I was interested by participating in this radio experiment, if only to bring some moderation to the hyperbolic tone found in the special issue. (As the theme was “Is there a universal mathematical formula? “, I was for a while confused about the debate, thinking that maybe the previous blogs on Stewart’s 17 Equations and Mackenzie’s Universe in Zero Words had prompted this invitation…)

As it happened [podcast link], the debate was quite moderate and reasonable, we discussed about the genesis, the dark ages, and the resurgimento of Bayesian statistics within statistics, the lack of Bayesian perspectives in the Higgs boson analysis (bemoaned by Tony O’Hagan and Dennis Lindley), and the Bayesian nature of learning in psychology. Although I managed to mention Poincaré’s Bayesian defence of Dreyfus (thanks to the Theory that would not die!), Nate Silver‘s Bayesian combination of survey results, and the role of the MRC in the MCMC revolution, I found that the information content of a one-hour show was in the end quite limited, as I would have liked to mention as well the role of Bayesian techniques in population genetic advances, like the Asian beetle invasion mentioned two weeks ago… Overall, an interesting experience, maybe not with a huge impact on the population of listeners, and a confirmation I’d better stick to the written world!

ASC 2012 (#1)

Posted in Statistics, Travel, University life with tags , , , , , , , , , , on July 11, 2012 by xi'an

This morning I attended Alan Gelfand talk on directional data, i.e. on the torus (0,2π), and found his modeling via wrapped normals (i.e. normal reprojected onto the unit sphere) quite interesting and raising lots of probabilistic questions. For instance, usual moments like mean and variance had no meaning in this space. The variance matrix of the underlying normal, as well of its mean, obviously matter. One thing I am wondering about is how restrictive the normal assumption is. Because of the projection, any random change to the scale of the normal vector does not impact this wrapped normal distribution but there are certainly features that are not covered by this family. For instance, I suspect it can only offer at most two modes over the range (0,2π). And that it cannot be explosive at any point.

The keynote lecture this afternoon was delivered by Roderick Little in a highly entertaining way, about calibrated Bayesian inference in official statistics. For instance, he mentioned the inferential “schizophrenia” in this field due to the between design-based and model-based inferences. Although he did not define what he meant by “calibrated Bayesian” in the most explicit manner, he had this nice list of eight good reasons to be Bayesian (that came close to my own list at the end of the Bayesian Choice):

  1. conceptual simplicity (Bayes is prescriptive, frequentism is not), “having a model is an advantage!”
  2. avoiding ancillarity angst (Bayes conditions on everything)
  3. avoiding confidence cons (confidence is not probability)
  4. nails nuisance parameters (frequentists are either wrong or have a really hard time)
  5. escapes from asymptotia
  6. incorporates prior information and if not weak priors work fine
  7. Bayes is useful (25 of the top 30 cited are statisticians out of which … are Bayesians)
  8. Bayesians go to Valencia! [joke! Actually it should have been Bayesian go MCMskiing!]
  9. Calibrated Bayes gets better frequentists answers

He however insisted that frequentists should be Bayesians and also that Bayesians should be frequentists, hence the calibration qualification.

After an interesting session on Bayesian statistics, with (adaptive or not) mixtures and variational Bayes tools, I actually joined the “young statistician dinner” (without any pretense at being a young statistician, obviously) and had interesting exchanges on a whole variety of topics, esp. as Kerrie Mengersen adopted (reinvented) my dinner table switch strategy (w/o my R simulated annealing code). Until jetlag caught up with me.


Get every new post delivered to your Inbox.

Join 777 other followers