This paper by Andrew Gelman and Christian Hennig calls for the abandonment of the terms objective and subjective in (not solely Bayesian) statistics. And argue that there is more than mere prior information and data to the construction of a statistical analysis. The paper is articulated as the authors’ proposal, followed by four application examples, then a survey of the philosophy of science perspectives on objectivity and subjectivity in statistics and other sciences, next to a study of the subjective and objective aspects of the mainstream statistical streams, concluding with a discussion on the implementation of the proposed move. Continue reading
Archive for refereeing
“An appealing approach would be a comparative, Bayesian model-choice method for inferring the probability of competing divergence histories while integrating over uncertainty in mutational and ancestral processes via models of nucleotide substitution and lineage coalescence.” (p.2)
Jamies Oaks arXived (a few months ago now) a rather extensive Monte-Carlo study on the impact of prior modelling on the model-choice performances of ABC model choice. (Of which I only became aware recently.) As in the earlier paper I commented on the Óg, the issue here has much more to do with prior assessment and calibration than with ABC implementation per se. For instance, the above quote recaps the whole point of conducting Bayesian model choice. (As missed by Templeton.)
“This causes divergence models with more divergence-time parameters to integrate over a much greater parameter space with low likelihood yet high prior density, resulting in small marginal likelihoods relative to models with fewer divergence-time parameters.” (p.2)
This second quote is essentially stressing the point with Occam’s razor argument. Which I deem [to be] a rather positive feature of Bayesian model choice. A reflection on the determination of the prior distribution, getting away from uniform priors, thus sounds most timely! The current paper takes place within a rather extensive exchange between Oak’s group and Hickerson’s group on what makes Bayesian model choice (and the associated software msBayes) pick or not the correct model. Oak and coauthors objected to the use of “narrow, empirically informed uniform priors”, arguing that this leads to a bias towards models with less parameters, a “statistical issue” in their words, while Hickerson et al. (2014) think this is due to msBayes way of selecting models and their parameters at random. However it refrains from reproducing earlier criticisms of or replies to Hickerson et al.
The current paper claims to have reached a satisfactory prior modelling with ¨improved robustness, accuracy, and power” (p.3). If I understand correctly, the changes are in replacing a uniform distribution with a Gamma or a Dirichlet prior. Which means introducing a seriously large and potentially crippling number of hyperparameters into the picture. Having a lot of flexibility in the prior also means a lot of variability in the resulting inference… In other words, with more flexibility comes more responsibility, to paraphrase Voltaire.
“I have introduced a new approximate-Bayesian model choice method.” (p.21)
The ABC part is rather standard, except for the strange feature that the divergence times are used to construct summary statistics (p.10). Strange because these times are not observed for the actual data. So I must be missing something. (And I object to the above quote and to the title of the paper since there is no new ABC technique there, simply a different form of prior.)
“ABC methods in general are known to be biased for model choice.” (p.21)
I do not understand much the part about (reshuffling) introducing bias as detailed on p.11: every approximate method gives a “biased” answer in the sense this answer is not the true and proper posterior distribution. Using a different (re-ordered) vector of statistics provides a different ABC outcome, hence a different approximate posterior, for which it seems truly impossible to check whether or not it increases the discrepancy from the true posterior, compared with the other version. I must admit I always find annoying to see the word bias used in a vague meaning and esp. within a Bayesian setting. All Bayesian methods are biased. End of the story. Quoting our PNAS paper as concluding that ABC model choice is biased is equally misleading: the intended warning represented by the paper was that Bayes factors and posterior probabilities could be quite unrelated with those based on the whole dataset. That the proper choice of summary statistics leads to a consistent model choice shows ABC model choice is not necessarily “biased”… Furthermore, I also fail to understand why the posterior probability of model i should be distributed as a uniform (“If the method is unbiased, the points should fall near the identity line”) when the data is from model i: this is not a p-value but a posterior probability and the posterior probability is not the frequentist coverage…
My overall problem is that, all in all, this is a single if elaborate Monte Carlo study and, as such, it does not carry enough weight to validate an approach that remains highly subjective in the selection of its hyperparameters. Without raising any doubt about an hypothetical “fixing” of those hyperparameters, I think this remains a controlled experiment with simulated data where the true parameters are know and the prior is “true”. This obviously helps in getting better performances.
“With improving numerical methods (…), advances in Monte Carlo techniques and increasing efficiency of likelihood calculations, analyzing rich comparative phylo-geographical models in a full-likelihood Bayesian framework is becoming computationally feasible.” (p.21)
This conclusion of the paper sounds over-optimistic and rather premature. I do not know of any significant advance in computing the observed likelihood for the population genetics models ABC is currently handling. (The SMC algorithm of Bouchard-Côté, Sankaraman and Jordan, 2012, does not apply to Kingman’s coalescent, as far as I can tell.) This is certainly a goal worth pursuing and borrowing strength from multiple techniques cannot hurt, but it remains so far a lofty goal, still beyond our reach… I thus think the major message of the paper is to reinforce our own and earlier calls for caution when interpreting the output of an ABC model choice (p.20), or even of a regular Bayesian analysis, agreeing that we should aim at seeing “a large amount of posterior uncertainty” rather than posterior probability values close to 0 and 1.
There is a long article in The Economist of this week (also making the front cover), which discusses how and why many published research papers have unreproducible and most often “wrong” results. Nothing immensely new there, esp. if you read Andrew’s blog on a regular basis, but the (anonymous) writer(s) take(s) pains to explain how this related to statistics and in particular statistical testing of hypotheses. The above is an illustration from this introduction to statistical tests (and their interpretation).
“First, the statistics, which if perhaps off-putting are quite crucial.”
It is not the first time I spot a statistics backed article in this journal and so assume it has either journalists with a statistics background or links with (UK?) statisticians. The description of why statistical tests can err is fairly (Type I – Type II) classical. Incidentally, it reports a finding of Ioannidis that when reporting a positive at level 0.05, the expectation of a false positive rate of one out of 20 is “highly optimistic”. An evaluation opposed to, e.g., Berger and Sellke (1987) who reported a too-early rejection in a large number of cases. More interestingly, the paper stresses that this classical approach ignores “the unlikeliness of the hypothesis being tested”, which I interpret as the prior probability of the hypothesis under test.
“Statisticians have ways to deal with such problems. But most scientists are not statisticians.”
The paper also reports about the lack of power in most studies, report that I find a bit bizarre and even meaningless in its ability to compute an overall power, all across studies and researchers and even fields. Even in a single study, the alternative to “no effect” is composite, hence has a power that depends on the unknown value of the parameter. Seeking a single value for the power requires some prior distribution on the alternative.
“Peer review’s multiple failings would matter less if science’s self-correction mechanism—replication—was in working order.”
The next part of the paper covers the failings of peer review, of which I discussed in the ISBA Bulletin, but it seems to me too easy to blame the ref in failing to spot statistical or experimental errors, when lacking access to the data or to the full experimental methodology and when under pressure to return (for free) a report within a short time window. The best that can be expected is that a referee detects the implausibility of a claim or an obvious methodological or statistical mistake. These are not math papers! And, as pointed out repeatedly, not all referees are statistically numerate….
“Budding scientists must be taught technical skills, including statistics.”
The last part discusses of possible solutions to achieve reproducibility and hence higher confidence in experimental results. Paying for independent replication is the proposed solution but it can obviously only apply to a small margin of all published results. And having control bodies testing at random labs and teams following a major publication seems rather unrealistic, if only for filling the teams of such bodies with able controllers… An interesting if pessimistic debate, in fine. And fit for the International Year of Statistics.
While I was editing our “famous” In praise of the referee paper—well, famous for being my most rejected paper ever!, with one editor not even acknowledging receipt!!—for the next edition of the ISBA Bulletin—where it truly belongs, being in fine a reply to Larry’s tribune therein a while ago—, Dimitris Politis had written a column for the IMS Bulletin—March 2013 Issue, page 11—on Refereeing and psychoanalysis.
Uh?! What?! Psychoanalysis?! Dimitris’ post is about referees being rude or abusive in their report, expressing befuddlement at seeing such behaviour in a scientific review. If one sets aside cases of personal and ideological antagonisms—always likely to occur in academic circles!—, a “good” reason for referees to get aggressively annoyed to the point of rudeness is sloppiness of one kind or another in the paper under review. One has to remember that refereeing is done for free and with no clear recognition in the overwhelming majority of cases, out of a sense of duty to the community and of fairness for having our own papers refereed. Reading a paper where typos abound, where style is so abstruse as to hide the purpose of the work, where the literature is so poorly referenced as to make one doubts the author(s) ever read another paper, the referee may feel vindicated by venting his/her frustration at wasting one’s time by writing a few vitriolic remarks. Dimitris points out this can be very detrimental to young researchers. True, but what happened to the advisor at this stage?! Wasn’t she/he supposed to advise her/his PhD student not only in conducting innovative research but also in producing intelligible outcome and in preparing papers suited for the journal it is to be submitted to..?! Being rude and aggressive does not contribute to improve the setting, no more than headbutting an Italian football player helps in winning the World Cup, but it may nonetheless be understood without resorting to psychoanalysis!
Most interestingly, this negative aspect of refereeing—that can be curbed by posterior actions of AEs and editors—would vanish if some of our proposals were implemented, incl. making referee’ reports part of the referee’s publication list, making those reports public as comments on the published paper (if published), and creating repositories or report commons independent from journals…
Sometimes, if not that often, I forget about submitted papers to the point of thinking they are already accepted. This happened with the critical analysis of Murray Aitkin’s book Statistical Inference, already debated on the ‘Og, written with Andrew Gelman and Judith Rousseau, and resubmitted to Statistics and Risk Modeling in November…2011. As I had received a few months ago a response to our analysis from Murray, I was under the impression it was published or about to be published. Earlier this week I started looking for the reference in connection with the paper I was completing on the Jeffreys-Lindley paradox and could not find it. Checking emails on that topic I then discovered the latest one was from Novtember 2011 and the editor, when contacted, confirmed the paper was still under review! As it got accepted only a few hours later, my impression is that it had been misfiled and forgotten at some point, an impression reinforced by an earlier experience with the previous avatar of the journal, Statistics & Decisions. In the 1990’s George Casella and I had had a paper submitted to this journal for a while, which eventually got accepted. Then nothing happened for a year and more, until we contacted the editor who acknowledged the paper had been misfiled and forgotten! (This was before the electronic processing of papers, so it is quite plausible that the file corresponding to our accepted paper went under a drawer or into the wrong pile and that the editor was not keeping track of those accepted papers. After all, until Series B turned submission into an all-electronic experience, I was using a text file to keep track of daily submissions…) If you knew George, you can easily imagine his reaction when reading this reply… Anyway, all is well that ends well in that our review and Murray’s reply will appear in Statistics and Risk Modeling, hopefully in a reasonable delay.
Before I run out of time, here is my answer to the ISBA Bulletin Students’ corner question of the term: “In terms of publications and from your own experience, what are the pros and cons of books vs journal articles?“
While I started on my first book during my postdoctoral years in Purdue and Cornell [a basic probability book made out of class notes written with Arup Bose, which died against the breakers of some referees’ criticisms], my overall opinion on this is that books are never valued by hiring and promotion committees for what they are worth! It is a universal constant I met in the US, the UK and France alike that books are not helping much for promotion or hiring, at least at an early stage of one’s career. Later, books become a more acknowledge part of senior academics’ vitae. So, unless one has a PhD thesis that is ready to be turned into a readable book without having any impact on one’s publication list, and even if one has enough material and a broad enough message at one’s disposal, my advice is to go solely and persistently for journal articles. Besides the above mentioned attitude of recruiting and promotion committees, I believe this has several positive aspects: it forces the young researcher to maintain his/her focus on specialised topics in which she/he can achieve rapid prominence, rather than having to spend [quality research] time on replacing the background and building reference. It provides an evaluation by peers of the quality of her/his work, while reviews of books are generally on the light side. It is the starting point for building a network of collaborations, few people are interested in writing books with strangers (when knowing it is already quite a hardship with close friends!). It is also the entry to workshops and international conferences, where a new book very rarely attracts invitations.
Writing a book is of course exciting and somewhat more deeply rewarding, but it is awfully time-consuming and requires a high level of organization young faculty members rarely possess when starting a teaching job at a new university (with possibly family changes as well!). I was quite lucky when writing The Bayesian Choice and Monte Carlo Statistical Methods to mostly be on leave from teaching, as it would have otherwise be impossible! That we are not making sufficient progress on our revision of Bayesian Core, started two years ago, is a good enough proof that even with tight planning, great ideas, enthusiasm, sale prospects, and available material, completing a book may get into trouble for mere organisational issues…
Robin Ryder pointed out to me this new experiment run by PLoS since March 2012, namely the introduction of a new article type, “called “Topic Pages” and written in the style of a Wikipedia article“. Not only this terrific idea gives more credence to Wikipedia biology pages, at least in their early stage, but also “the paper contains direct links to Wikipedia pages for background“. Now note that PLoS keeps a wiki separate from Wikipedia. I wonder about the development of a similar interface for statistics, maybe as a renaissance of the former StatProb wiki initiated by John Kimmel two years ago. And mostly abandoned for the past months…
When looking around the site I came upon a page on ABC written by Mikael Sunnåker et al.! A very nice survey of the existing debates around ABC, including uncertainties on the validity of the ABC approximation to the Bayes factor. Ad mentioning the original version of Donald Rubin (1984, AoS). As well as of Peter Diggle and Richard Gratton (1984, JRSS Series B). (I have a lingering feeling I may have seen this paper earlier as a referee and that I sadly missed the connection with this wiki page, hence refereed it as a “classical” submission… However, I just cannot remember whether or not this happened, nor can I find any trace in my past reviews! Which may hint at a weakness of this solution, by the way, namely that referees are less eager to review surveys than novel research articles…) To reinforce the above point, compare this page on ABC with the page on ABC produced by Wikipedia!