Integrating replication and reproducibility into social science education is crucial. In a pilot project earlier this year, I set up the Cambridge Replication Workshop. In the course, students from different disciplines have eight weeks to replicate a published paper. For anyone planning to design a similar course, I collected feedback from my teaching assistants (Aiora Zabala, Chris Bentz and Vaishali Mahalingam. Part 1 of 3: Challenges of replication in class.
The Cambridge Replication Workshop, which took place in spring 2013, was a pilot project at the Social Sciences Research Methods Centre. During eight weeks, PhD students replicated a paper in their field. They had weekly 2-hour sessions with a short lecture that guided them through the replication steps, and a practical session with several TAs who helped them with problem solving in R (see syllabus, handouts and assignments here). Without the TAs this pilot project could never have succeeded, and I find it especially helpful for future workshops, here at Cambridge and elsewhere, to summarize their feedback on typical challenges during the workshop.
Inadequate data availability
The TAs found that through replication, students realized the problems and pitfalls of data collection, analysis and presentation. In some of the papers to be replicated, students were confronted with:
- It wasn’t always easy to get datasets from authors. Finding the original database was not possible in some cases.
- Some data sets were not clearly organised by the original author.
- Only in a few cases the student was able to contact the author.
- Even if authors reply, it might be that they don’t remember the details of their analyses themselves.
- In one case, the student gathered the data from the original sources (official statistical bureaus), but was not able to find one of the dependent variables.
- Data was partly wrong or presented in ambiguous manner
Opaque methods description in papers
Even when data were available, sometimes students had difficulties to understand the author’s arguments on the basis of the data.
- Statistical models were often times not specified appropriately and remained opaque
- Sometimes replicating a study does involve much more than just running similar code in a different program or using a slightly changed model, it can actually mean to replicate the study even down to the theoretical reasoning behind it
When students got different results
In the practical parts of the weekly sessions the TAs found that in many cases, the results obtained were slightly different or radically different from the original paper.
- For the students, it was a challenge to understand how exactly the original authors had dealt with missing variables in their study, or what exact variables they had used.
- Since some of the students were statistics beginners, some of them were confused by the different results, and not always confident about their own results – since they differed from published work that went through peer-review process.
- Students would then say in class discusssions, “maybe Im doing something wrong…”
[See also TA feedback on why R is best for replication and which students should join a replication course.]