Archive for reproducible research

prepaid ABC

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on January 16, 2019 by xi'an

Merijn Mestdagha, Stijn Verdoncka, Kristof Meersa, Tim Loossensa, and Francis Tuerlinckx from the KU Leuven, some of whom I met during a visit to its Wallon counterpart Louvain-La-Neuve, proposed and arXived a new likelihood-free approach based on saving simulations on a large scale for future users. Future users interested in the same model. The very same model. This makes the proposal quite puzzling as I have no idea as to when situations with exactly the same experimental conditions, up to the sample size, repeat over and over again. Or even just repeat once. (Some particular settings may accommodate for different sample sizes and the same prepaid database, but others as in genetics clearly do not.) I am sufficiently puzzled to suspect I have missed the message of the paper.

“In various fields, statistical models of interest are analytically intractable. As a result, statistical inference is greatly hampered by computational constraint s. However, given a model, different users with different data are likely to perform similar computations. Computations done by one user are potentially useful for other users with different data sets. We propose a pooling of resources across researchers to capitalize on this. More specifically, we preemptively chart out the entire space of possible model outcomes in a prepaid database. Using advanced interpolation techniques, any individual estimation problem can now be solved on the spot. The prepaid method can easily accommodate different priors as well as constraints on the parameters. We created prepaid databases for three challenging models and demonstrate how they can be distributed through an online parameter estimation service. Our method outperforms state-of-the-art estimation techniques in both speed (with a 23,000 to 100,000-fold speed up) and accuracy, and is able to handle previously quasi inestimable models.”

I foresee potential difficulties with this proposal, like compelling all future users to rely on the same summary statistics, on the same prior distributions (the “representative amount of parameter values”), and requiring a massive storage capacity. Plus furthermore relying at its early stage on the most rudimentary form of an ABC algorithm (although not acknowledged as such), namely the rejection one. When reading the description in the paper, the proposed method indeed selects the parameters (simulated from a prior or a grid) that are producing pseudo-observations that are closest to the actual observations (or their summaries s). The subsample thus constructed is used to derive a (local) non-parametric or machine-learning predictor s=f(θ). From which a point estimator is deduced by minimising in θ a deviance d(s⁰,f(θ)).

The paper does not expand much on the theoretical justifications of the approach (including the appendix that covers a formal situation where the prepaid grid conveniently covers the observed statistics). And thus does not explain on which basis confidence intervals should offer nominal coverage for the prepaid method. Instead, the paper runs comparisons with Simon Wood’s (2010) synthetic likelihood maximisation (Ricker model with three parameters), the rejection ABC algorithm (species dispersion trait model with four parameters), while the Leaky Competing Accumulator (with four parameters as well) seemingly enjoys no alternative. Which is strange since the first step of the prepaid algorithm is an ABC step, but I am unfamiliar with this model. Unsurprisingly, in all these cases, given that the simulation has been done prior to the computing time for the prepaid method and not for either synthetic likelihood or ABC, the former enjoys a massive advantage from the start.

“The prepaid method can be used for a very large number of observations, contrary to the synthetic likelihood or ABC methods. The use of very large simulated data sets allows investigation of large-sample properties of the estimator”

To return to the general proposal and my major reservation or misunderstanding, for different experiments, the (true or pseudo-true) value of the parameter will not be the same, I presume, and hence the region of interest [or grid] will differ. While, again, the computational gain is de facto obvious [since the costly production of the reference table is not repeated], and, to repeat myself, makes the comparison with methods that do require a massive number of simulations from scratch massively in favour of the prepaid option, I do not see a convenient way of recycling these prepaid simulations for another setting, that is, when some experimental factors, sample size or collection, or even just the priors, do differ. Again, I may be missing the point, especially in a specific context like repeated psychological experiments.

While this may have some applications in reproducibility (but maybe not, if the goal is in fact to detect cherry-picking), I see very little use in repeating the same statistical model on different datasets. Even repeating observations will require additional nuisance parameters and possibly perturb the likelihood and/or posterior to large extents.

5 ways to fix statistics?!

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on December 4, 2017 by xi'an

In the last issue of Nature (Nov 30), the comment section contains a series of opinions on the reproducibility crisis, by five [groups of] statisticians. Including Blakeley McShane and Andrew Gelman with whom [and others] I wrote a response to the seventy author manifesto. The collection of comments is introduced with the curious sentence

“The problem is not our maths, but ourselves.”

Which I find problematic as (a) the problem is never with the maths, but possibly with the stats!, and (b) the problem stands in inadequate assumptions on the validity of “the” statistical model and on ignoring the resulting epistemic uncertainty. Jeff Leek‘s suggestion to improve the interface with users seems to come short on that level, while David Colquhoun‘s Bayesian balance between p-values and false-positive only address well-specified models. Michèle Nuitjen strikes closer to my perspective by arguing that rigorous rules are unlikely to help, due to the plethora of possible post-data modellings. And Steven Goodman’s putting the blame on the lack of statistical training of scientists (who “only want enough knowledge to run the statistical software that allows them to get their paper out quickly”) is wishful thinking: every scientific study [i.e., the overwhelming majority] involving data cannot involve a statistical expert and every paper involving data analysis cannot be reviewed by a statistical expert. I thus cannot but repeat the conclusion of Blakeley and Andrew:

“A crucial step is to move beyond the alchemy of binary statements about ‘an effect’ or ‘no effect’ with only a P value dividing them. Instead, researchers must accept uncertainty and embrace variation under different circumstances.”

new reproducibility initiative in TOMACS

Posted in Books, Statistics, University life with tags , , , , , , , , , , on April 12, 2016 by xi'an

[A quite significant announcement last October from TOMACS that I had missed:]

To improve the reproducibility of modeling and simulation research, TOMACS  is pursuing two strategies.

Number one: authors are encouraged to include sufficient information about the core steps of the scientific process leading to the presented research results and to make as many of these steps as transparent as possible, e.g., data, model, experiment settings, incl. methods and configurations, and/or software. Associate editors and reviewers will be asked to assess the paper also with respect to this information. Thus, although not required, submitted manuscripts which provide clear information on how to generate reproducible results, whenever possible, will be considered favorably in the decision process by reviewers and the editors.

Number two: we will form a new replicating computational results activity in modeling and simulation as part of the peer reviewing process (adopting the procedure RCR of ACM TOMS). Authors who are interested in taking part in the RCR activity should announce this in the cover letter. The associate editor and editor in chief will assign a RCR reviewer for this submission. This reviewer will contact the authors and will work together with the authors to replicate the research results presented. Accepted papers that successfully undergo this procedure will be advertised at the TOMACS web page and will be marked with an ACM reproducibility brand. The RCR activity will take place in parallel to the usual reviewing process. The reviewer will write a short report which will be published alongside the original publication. TOMACS also plans to publish short reports about lessons learned from non-successful RCR activities.

[And now the first paper reviewed according to this protocol has been accepted:]

The paper Automatic Moment-Closure Approximation of Spatially Distributed Collective Adaptive Systems is the first paper that took part in the new replicating computational results (RCR) activity of TOMACS. The paper completed successfully the additional reviewing as documented in its RCR report. This reviewing is aimed at ensuring that computational results presented in the paper are replicable. Digital artifacts like software, mechanized proofs, data sets, test suites, or models, are evaluated referring to ease of use, consistency, completeness, and being well documented.


Posted in Books, Statistics with tags , , , , , , , , on December 1, 2015 by xi'an

WariseWhile in Warwick this week, I borrowed a recent issue (Oct. 08, 2015) of Nature from Tom Nichols and read it over diners in a maths house. Its featured topic was reproducibility, with a long initial (or introductory) article about “Fooling ourselves”, starting with an illustration from Andrew himself who had gotten a sign wrong in one of those election studies that are the basis of Red State, Blue State. While this article is not bringing radically new perspectives on the topic, there is nothing shocking about it and it even goes on mentioning Peter Green and his Royal Statistical Society President’s tribune about the Sally Clark case and Eric-Jan Wagenmakers with a collaboration with competing teams that sounded like “putting one’s head on a guillotine”. Which relates to a following “comment” on crowdsourcing research or data analysis.

I however got most interested by another comment by MacCoun and Perlmutter, where they advocate a systematic blinding of data to avoid conscious or unconscious biases. While I deem the idea quite interesting and connected with anonymisation techniques in data privacy, I find the presentation rather naïve in its goals (from a statistical perspective). Indeed, if we consider data produced by a scientific experiment towards the validation or invalidation of a scientific hypothesis, it usually stands on its own, with no other experiment of a similar kind to refer to. Add too much noise and only noise remains. Add too little and the original data remains visible. This means it is quite difficult to calibrate the blinding mechanisms in order for the blinded data to remain realistic enough to be analysed. Or to be different enough from the original data for different conclusions to be drawn. The authors suggest blinding being done by a software, by adding noise, bias, label switching, &tc. But I do not think this blinding can be done blindly, i.e., without a clear idea of what the possible models are, so that the perturbed datasets created out of the original data favour more one of the models under comparison. And are realistic for at least one of those models. Thus, some preliminary analysis of the original or of some pseudo-data from each of the proposed models is somewhat unavoidable to calibrate the blinding machinery towards realistic values. If designing a new model is part of the inferential goals, this may prove impossible… Again, I think having several analyses run in parallel with several perturbed datasets quite a good idea to detect the impact of some prior assumptions. But this requires statistically savvy programmers. And possibly informative prior distributions.

beyond subjective and objective in Statistics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on August 28, 2015 by xi'an

“At the level of discourse, we would like to move beyond a subjective vs. objective shouting match.” (p.30)

This paper by Andrew Gelman and Christian Hennig calls for the abandonment of the terms objective and subjective in (not solely Bayesian) statistics. And argue that there is more than mere prior information and data to the construction of a statistical analysis. The paper is articulated as the authors’ proposal, followed by four application examples, then a survey of the philosophy of science perspectives on objectivity and subjectivity in statistics and other sciences, next to a study of the subjective and objective aspects of the mainstream statistical streams, concluding with a discussion on the implementation of the proposed move. Continue reading