More and more funders and journals require data management plans and public access to all types of research data. At the same time, many researchers struggle to balance transparency against legal and ethical obligations. Following on part I of this blog post, what are some simple guidelines on how to share sensitive data?
Last year, Danish researchers published web-scraped data from the online dating platform OkCupid, including usernames, location, age, political or religious opinions and sexual preferences.
In a paper accompanying the data release, the researchers merely stated: “Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.”
This caused a huge controversy because people could actually be identified; they had not consented to being part of this research when signing up to the site; and OKCupid had not been asked for permission. The data set was deleted.
This does not mean you can’t publish sensitive data related to human subjects – but you have to handle them differently and be extra cautious.
Much research data – even sensitive data – can be shared ethically and legally if researchers employ strategies of informed consent, anonymisation and controlling access to data. (UK Data Service)
What is sensitive data?
We need to distinguish between personal and sensitive data. According to the Data Protection Act 1998, sensitive data is can include information on a person’s race, ethnic origin, political opinion, religious/similar beliefs, health, sexual life, (alleged) offenses and court proceedings, trade union membership and so on. This data itself is not problematic and can be shared.
But problems arise when the same data set contains information that could identify which individuals this data refers to. Personal data includes records or other information that on its own, or linked with other data or information can reveal the identity of an actual living person.
So the problem really lies in publishing personal data together with the sensitive information. When collecting data from individuals that are personal, and sensitive, data needs to be processed fairly and lawfully. By UK law, research data collection must
- be relevant and not excessive (don’t ask participants more questions than absolutely necessary)
- be related to the stated purpose (do not gather information for other projects than those agreed on the consent form or information sheet)
- not be kept longer than necessary (make sure to tell participants how long and for what purpose you are keeping this data)
- inform subject about the use, storage, transfer or data (ensure they know how and why their data is kept secure)
So can you share sensitive data?
None of these regulations mean that you cannot share sensitive data. It also doesn’t mean that you have to destroy the data after x years – even though many universities’ ethics committee forms (similar to IRB in the US) state this in their guidelines. It also doesn’t mean that your data can only every be used for one study and not be used again by other researchers. Most importantly, it does not mean that the data cannot be uploaded to a data archive. In fact, a report on data sharing practice has found that the UK data laws are “commonly cited as a reason not to release information when it may be perfectly legitimate to do so.” (Thomas and Walport 2008).
Unfortunately, many researchers working with personal and sensitive data state that transparency cannot apply to their work; but they are often misinformed (sometimes even by their own university’s guidelines).
What the law does mean
The UK Data Service clarifies that even sensitive data can be shared if suitable procedures and precautions are taken. For example, consent forms for interviews or regarding participants in studies should allow for data sharing, whilst also protecting the confidentiality of participants. Most research data obtained from participants can be successfully shared without breaching confidentiality.
Sensitive research data can be shared ethically if valid consent was given by participants (informed, voluntary, with capacity), and identifiable information is anonymised. You have to separate personal from sensitive information. This is usually done by removing direct (names, postcode or pictures etc.) or indirect identifiers (when linked together it can identify people, e.g. workplace, occupation, salary, age etc.).
Data can also be retained for a longer period of time, or even indefinitely, if necessary. It is a common misunderstanding that data has to be deleted after a certain number of years. Key is that participants must be informed about how long you are keeping the data, where that is, and for what purpose (e.g. for replication purposes). Data can be used for more than one project, but only if the purpose of collection is reasonably broad and clearly stated in the consent form or (survey) information sheet. Of course, you should not mislead participants here.
Also, you can upload data in a secure archive (UK Data Service, Harvard Dataverse, Qualitative Data Repository) if you manage safe access options properly.
What to do when sharing is not an option?
In some cases you cannot fully anonymise the data. For example, it may still be possible to identify participants indirectly from the information even after you have removed IDs. Or in other cases you may never get any participants to sign consent forms unless you promise that you will not share any data, even if anonymised (think about interviews in sensitive research areas e.g. human rights, crime, health etc).
In those cases, I would recommend that you secure the data by uploading them to a trustworthy repository, but blocking access. You can then decide who gets to see this data and in which form (maybe only collaborators will have access). By uploading this data and blocking it, you still signal that you take transparency seriously.
Data sharing is not the only way to work transparently
There are three kinds of transparency, (1) data transparency, (2) analytic transparency, and (3) production transparency. Thus, you can ensure transparency of your work to a certain degree even if you don’t share the data set itself.
- Production Transparency: Go beyond merely saying that the data ‘is confidential.’ You can say what information is included, how the data was collected, and how another researcher could follow your steps of data collection. You should also say why anonymisation is still not possible.
- Analytic transparency: Describe what particular evidence, or type of evidence, supports particular claims in your analysis, even if you cannot fully release all data. For example, you could give a text fragment from an interview that will not reveal the identity of the interviewee instead of publishing the full transcript. More info on how to share non-numerical data is given by the Qualitative Data Repository.
By providing details about your research process in this way, you strengthen trust in your inferential claims and choices of data and methodology (see also Lupia/Elman 2014; Moravcsik 2014).
Janz, Nicole & Figueiredo, Dalson (2017, March 13). Workshop: The Gold Standard of Reproducible Research. Retrieved from osf.io/2fqnw (slides, examples, handouts)
UK Data Service: “Legal and ethical issues” https://www.ukdataservice.ac.uk/manage-data/legal-ethical
Qualitative Data Repository, A GUIDE TO SHARING QUALITATIVE DATA. Available from: https://qdr.syr.edu/guidance/sharingdata
Kirilova, D. & Karcher, S., (2017). Rethinking Data Sharing and Human Participant Protection in Social Science Research: Applications from the Qualitative Realm. Data Science Journal. 16, p.43. DOI: http://doi.org/10.5334/dsj-2017-043
Moravcsik, A. (2014). Transparency: The Revolution in Qualitative Research. PS: Political Science & Politics, 47(1), 48-53. doi:10.1017/S1049096513001789
Lupia, A., & Elman, C. (2014). Openness in political science: Data access and research transparency. PS – Political Science and Politics, 47(1), 19-42. doi: 10.1017/S1049096513001716
Christensen, Garret (2016). Manual of Best Practices in Transparent Social Science Research https://github.com/garretchristensen/BestPracticesManual
Open Science Framework. Transparency and Openness Promotion (TOP) Guidelines. https://cos.io/top/
Markowetz, Florian (2015), Five selfish reasons to work reproducibly. Genome Biology 16:274. doi: 10.1186/s13059-015-0850-7. Video.