Trying to replicate a published paper can be challenging. Often, authors do not remember their variable codings, models specifications, or where their files are. My students of the Cambridge Replication Workshop just finished their final assignments – here are their ‘horror’ stories.
The Cambridge Replication Workshop 2013 was a pilot project at the Social Sciences Research Methods Centre. During eight weeks, PhD students replicated a chosen paper in their field. In the shared class-dropbox, one of them started calling his Rcode ‘nightmare1.R’, ‘nightmare2.R’ and so on, because he was not able to recode the outcome variable after four weeks of corresponding with the original author. In class, we spent quite a bit of time on discussing who had written to which author yet, and how to try to get more information out of them.
Here are some stories of students who found that their papers were very difficult to reproduce.
Data sources were a mystery
“[T]he authors did not properly cite their data. They gave their general sources (…); however, they did not give the actual databases that were used for their analysis. As a result, I had to re-collect all of the data for the variables they used ‘by hand’, fill in missing variables with suitable substitutes (…) and format my own database. This took a very large amount of time to do.”
Codings mistakes uncovered
“I found four coding mistakes admitted to by the author. They did not change the results in her bivariate association tests, but did weaken the value of the descriptive data for the small n group of arbitration cases she claimed to be representing.”
Codings were not described
“[In the original paper], some important methodologically details are excluded, particularly in terms of recoding. For instance, in the first section of the paper various graphs are produced (…). [A]ctually trying to replicate the exact pattern in the plot has proven immensely difficult, owing largely to [the author’s] poorly described conversions of columns of the dataset to a binary format, and lack of mention of how he dealt with results which did not fit into this binary formatting.”
P-values were off
“I didn’t get the same p-values (…), although most are in the same direction of significance. [The author] hasn’t responded about the discrepancies, so I guess I’ll just carry on.”
Authors did not respond
“[I had] the problem of getting data sets and/or getting in touch with authors who usually dont have time/or dont want cause they have screwed up with the data.“
If you are now put off, remember these 7 good reasons to reproduce work. At least three of my students are planning to publish their replication once it’s polished.