These are the best pieces I came across in the last month on replication, reproducibility & data sharing. With the Economist and the LA times reporting on reproducibility, everyone talks about the topic. Collection #4 (October 2013).
Best paper: 10 Rules for Reproducible Research
In Ten Simple Rules for Reproducible Computational Research, the authors note that ”
In a pragmatic setting, with publication pressure and deadlines, one may face the need to make a trade-off between the ideals of reproducibility and the need to get the research out while it is still relevant.” However, one should at least be able to reproduce your own work. The ten rules are:
- Rule 1: For Every Result, Keep Track of How It Was Produced
- Rule 2: Avoid Manual Data Manipulation Steps
- Rule 3: Archive the Exact Versions of All External Programs Used
- Rule 4: Version Control All Custom Scripts
- Rule 5: Record All Intermediate Results, When Possible in Standardized Formats
- Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds
- Rule 7: Always Store Raw Data behind Plots
- Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
- Rule 9: Connect Textual Statements to Underlying Results
- Rule 10: Provide Public Access to Scripts, Runs, and Results
No time to document your research steps
Directly connected to the above, this article explains why so few scholars do it: It takes time! The Big Data Brain Drain: Why Science is in Trouble: “In the “publish-or-perish” model which dominates most research universities, any time spent building and documenting software tools is time spent not writing research papers.”
Recipe for replication
The Replication Recipe: What makes for a convincing replication?: A must read for anyone engaged in replication. Here are the main points that make for a good ‘close’ replication (not just in Psychology):
- Carefully defining the effects and methods that the researcher intends to replicate;
- Following as exactly as possible the methods of the original study (including participant recruitment, instructions, stimuli, measures, procedures, and analyses);
- Having high statistical power;
- Making complete details about the replication available, so that interested experts can fully evaluate the replication attempt (or attempt another replication themselves);
- Evaluating replication results, and comparing them critically to the results of the original study.
A conversation on reproducibility gives two great ideas to improve reproducibility. (1) Pre-publication peer review for reproducibility: “group X submits a paper, and group Y reviews that paper and attempts to reproduce the results in it before it is sent out to other reviewers. Then, when it is sent out to other reviewers, it is sent along with a letter stating that group Y verified the reproducibility of the paper’s results”. (2) Reproducibility “hangouts” where scholars discuss their replication attempts, e.g. on google or skype. “The idea would be to pick a paper for a given month, and then have several groups spend part of that month going through the paper and trying to reproduce it.”
Journal of Reproducible Research
The “Journal of Reproducible Research”? states that such a journal “would protect against one-off results and science fraud.” Let’s play around with this idea and join the discussion on their blog. I also thought about something like this here.
Peer reviewed and reproduced
“It’s not only peer-reviewed, it’s reproducible!” discusses how journal articles could be labelled as not only ‘peer-reviewed’, but additionally as ‘reproduced’ when someone went ahead and checked the data during the peer-review process. Currently, a serious limitation in usual peer-review processes is “that the primary data and other supplementary material such as documentation source code are usually not available. The results of the paper are thus not reproducible. When I review such a paper, I usually have to trust the authors on a number of issues: that they have described the process of achieving the results as accurate as possible, that they have not left out any crucial pre-processing steps and so on.”
Irreproducible Biomedical Research
Elizabeth Iorns Explains the Reproducibility Initiative: An interview on why results of many peer-reviewed studies just do not hold up to scrutiny, and how the Reproducibility Initiative can improve replication standards. For example, Iorns suggests: “The problem could be solved by having independent labs replicate key experimental data that is published as an update to the original study.”
Several articles further discuss Iorns’ Reproducibility Initiative mentioned above, for example Initiative gets $1.3 million to verify findings of 50 high-profile cancer papers, and Reproducibility Project: Cancer Biology.
Reproducibility hits mainstream I: The Economist
In two articles, How science goes wrong and Trouble at the lab, the Economist discusses reproducibility. It states “The false trails laid down by shoddy research are an unforgivable barrier to understanding.” The main problems are: (1) scholars ignore “statistical power” of their study, (2) bias favouring the publication finding ‘something new’, (3) a lot of research is poorly thought through, (4) disinterested peer-reviewers, (5) journals need to lay out new standards. The Economist also criticizes that “replication is hard and thankless”. Here’s an answer to the Economist pieces: Why the Economist is wrong about science reproducibility. It states that “Science doesn’t have a scientist problem, science has a communication problem.”
Reproducibility hits mainstream II: LA Times
Science has lost its way, at a big cost to humanity in the LA Times also picked up on the reproducibility challenge, stating that “researchers are rewarded for splashy findings, not for double-checking accuracy”.
Spoof paper and open access journals
Who’s Afraid of Peer Review?: A spoof paper concocted by Science reveals little scrutiny or review at many open-access journals. I’m not sure if this goes for open access journals only – but the article is interesting!
Check several sources before replicating a study
You can’t read just one: Reproducibility and multiple sources: “Funding agencies, publishers and tenure and promotion committees still value original work more highly than verification work.” A librarian gives students tips how to engage in replication by relying on more than one source.
No More Storytelling
Against storytelling of scientific results: When we tailor our reporting of model selection, plots and data choice, we tell stories that fit our message. This hinders transparency and reproducibility.
All that jazz
Keynote speech as on SlideShare: Goble Results may vary: What is reproducible? Why do open science and who gets the credit?. Lean back and enjoy!
Genomics paper on reproducibility
Assessing the validity and reproducibility of genome-scale predictions is a paper in which the authors answer to recent replication scandals. “We (…) introduce a statistical method for planning validation experiments that will obtain the tightest reproducibility confidence limits, which, for a fixed total number of experiments, returns the optimal number of replicates for the study.”
More ‘best of’ collections
… are here.