It took three years to replicate the economic paper “Growth in a Time of Debt” by Carmen Reinhart and Kenneth Rogoff. In these three years, the paper and its errors were cited widely in the field and heavily relied on by politicians. Why has no one found out earlier? Because the data were not online. Such delay in cross-checking work can be prevented. By using the natural sciences as a model, journals in political science and economics can adopt a better replication policy. The key point: journals should require authors to provide their data set when submitting a manuscript.
The replication scandal about Reinhart-Rogoff shook up the academic world. A spread sheet error and some other methodological mistakes had led to three years of citing wrong results – by academics and politicians. The problem of inadequate reproducibility in political science and economics is not new. For years, scholars have been pushing for data sharing enforced by journals.
Political Science and Economics: the current status
However, the current status is still problematic. In political science, only 18 of 120 journals have a replication policy, as a recent paper shows. It seems that not much has changed since the ‘Symposium on Replication in International Studies Research‘ demonstrated the lack of reproducibility in IR and politics.
In economics, an analysis of nearly 500 webpages of scholars shows that 89.14% neither have a data & code section, nor indicate whether and where their data is available (Andreoli Versbach and Mueller-Langer 2013). To deal with this problem, the university of Cambridge recently created the Open Economics Working Group to push for a better replication culture.
Key to change: journal replication policies
The key to a change are journals, as I have argued before here and here. Journals can enforce better reproducibility and enhance quality of scholarly work. Currently, the few journals that have a replication policy mostly ask authors to upload their data somewhere, and trust them to do so. Very often I, and my students in the replication workshop, have tried to get data and not always received them. If at all, data arrived with considerable delay.
Such delay can be highly problematic, as Reinhart-Rogoff shows. Therefore, journals need to re-think their replication rules.
I propose that data must be submitted with the manuscript, and be published on the date the article is published online for the first time.
Currently, this does not seem the case.
Natural sciences as a model
How can this work in practice? We should look to the natural sciences as a model for reproducibility. In many fields, such as pharmacy or medical research, errors can cost lives – so their policies are more advanced. This is how replication is ensured by e.g. Nature (guideline) and Science (guideline):
- The author uploads data to an external dataverse before submission and gets an accession number.
- The data are blocked and not public yet, and the peer-reviewers do not see the data.
- Then, when the paper is accepted, the journal prints the accession number for the replication data in the first footnote, and the author ‘tells’ the dataverse to release the data on the publication date.
The key is the use of external, independent dataverses, so that authors don’t have to compile their own webpage. In the natural sciences, the main ones are:
I should stress that the data only become public when the article is accepted and published. Authors do not have to worry that someone might access their data before their work is published, and peer-reviewers would not be expected to reproduce work. The main goal is to force authors to clean up their data beforehand.
The main benefit for the journal is quality assurance i.e. that they keep up their good reputation of publishing only reproducible results. The main benefit for the community is that one can build on results faster.
An independent dataverse in Political Science
An independent dataverse exists already: the Harvard Dataverse. The service is open to all scientific data worldwide, free, and accepted in the community as a reliable source of data. Journals could create a ‘group space’ there, as the American Journal of Political Science (AJPS) has done. The AJPS policy is: “If a manuscript is accepted for publication, the manuscript will not be published unless the first footnote explicitly states where the data used in the study can be obtained for purposes of replication and any sources that funded the research. All replication files must be stored on the AJPS Data Archive on Dataverse.” (Guideline | manual for authors)
The way I understand it, though, AJPS does not require that data are uploaded upon submission (but correct me if I’m wrong). It still seems closest to the natural sciences model as of yet.
[Update: The Journal of Peace Research requires authors to upload their data with the final version of the article to their webpage or an archive (guideline), once you get confirmation that the paper is accepted. Otherwise they won’t publish the article. Thank you Krishna Vadlamannati for this info.]
Just after finishing this blog post I found this at the New York Times: Replicating Research: Austerity and Beyond, by NANCY FOLBRE, economist at the University of Massachusetts, Amherst.