While in Warwick this week, I borrowed a recent issue (Oct. 08, 2015) of Nature from Tom Nichols and read it over diners in a maths house. Its featured topic was reproducibility, with a long initial (or introductory) article about “Fooling ourselves”, starting with an illustration from Andrew himself who had gotten a sign wrong in one of those election studies that are the basis of Red State, Blue State. While this article is not bringing radically new perspectives on the topic, there is nothing shocking about it and it even goes on mentioning Peter Green and his Royal Statistical Society President’s tribune about the Sally Clark case and Eric-Jan Wagenmakers with a collaboration with competing teams that sounded like “putting one’s head on a guillotine”. Which relates to a following “comment” on crowdsourcing research or data analysis.
I however got most interested by another comment by MacCoun and Perlmutter, where they advocate a systematic blinding of data to avoid conscious or unconscious biases. While I deem the idea quite interesting and connected with anonymisation techniques in data privacy, I find the presentation rather naïve in its goals (from a statistical perspective). Indeed, if we consider data produced by a scientific experiment towards the validation or invalidation of a scientific hypothesis, it usually stands on its own, with no other experiment of a similar kind to refer to. Add too much noise and only noise remains. Add too little and the original data remains visible. This means it is quite difficult to calibrate the blinding mechanisms in order for the blinded data to remain realistic enough to be analysed. Or to be different enough from the original data for different conclusions to be drawn. The authors suggest blinding being done by a software, by adding noise, bias, label switching, &tc. But I do not think this blinding can be done blindly, i.e., without a clear idea of what the possible models are, so that the perturbed datasets created out of the original data favour more one of the models under comparison. And are realistic for at least one of those models. Thus, some preliminary analysis of the original or of some pseudo-data from each of the proposed models is somewhat unavoidable to calibrate the blinding machinery towards realistic values. If designing a new model is part of the inferential goals, this may prove impossible… Again, I think having several analyses run in parallel with several perturbed datasets quite a good idea to detect the impact of some prior assumptions. But this requires statistically savvy programmers. And possibly informative prior distributions.