Best of replication & data sharing: Collection 5 (Nov 2013)

bestofThese are the best pieces I came across in the last month on replication, reproducibility & data sharing. Collection #5 (Nov 2013).

280 hours to reproduce a paper in computational biology

Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome From the abstract: How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts)… Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper. The result: 280 hours. This article also describes what the paper did.

Psychologists strike a blow for reproducibility

A large international group set up to test the reliability of psychology experiments has successfully reproduced the results of 10 out of 13 past experiments. The consortium also found that two effects could not be reproduced. Read the full article by Ed Young on Nature.com.

Scientists reward authors who report their own errors

Doing the right thing: Scientists reward authors who report their own errors, says study: Usually authors don’t self-retract their papers even if they find errors. Why? They need citations. A new study across disciplines shows that citation frequencies decline less drastically when authors self-retract their paper – compared to retractions that were not self-reported. Is this an incentive to report your own errors?

Thesis advisor requires reproducibility

Will you join my committee?: A researcher only agrees to supervise research (in the U.S.: be on a student’s committee) when they agree to reproducibility of all thesis results.

The One With Lego

Research Roles Through Lego: A workshop lets researchers re-create a given lego model with lego without major instructions. Why? To demonstrate to them frustration created by non-reproducible results.

Maximum transparency I

Git/GitHub, Transparency, and Legitimacy in Quantitative Research: Keep your data and models instantly updated (and published) via GitHub – this is the maximum transparency and reproducibility one can achieve. “Maintaining your research project on GitHub confers advantages beyond the social desireability of the practice and the the technical benefits of using a revision control system. Making your research publicly accessible in this manner makes it considerably easier to replicate, meaning that, all else equal, more people will build on your work, leading to higher citation counts and impact.” See also a package to publish your workflow in R here.

Maximum transparency II

Retracing Steps: Similar to the idea of GitHub (above), researchers can join an open-source computational platform, called Synapse, which enables seamless collaboration among scientific teams (providing them with the tools to share data, source code, and analysis methods on specific research projects or on any of the 10,000 datasets in the organization’s massive data corpus). Key to these collaborations are tools embedded in Synapse that allow for everything from data “freezing” and versioning controls to graphical provenance records—delineating who did what to which dataset, for example.

P-values and reproducibility

Weak statistical standards implicated in scientific irreproducibility: The plague of non-reproducibility in science may be mostly due to scientists’ use of weak statistical tests, as shown by an innovative method developed by statistician Valen Johnson, at Texas A&M University in College Station. Here’s the paper [pdf].

 

More ‘best of’ collections

… are here.

Tagged , , , , ,

Leave a comment