Clinical Trial Data
Moving Toward the Curation of Data from Clinical Trials
Annotations by Katherine Akers
Definition of Project
Clinical trials are research studies that test the safety and efficacy of specific interventions (e.g., therapeutic drugs, medical devices or procedures, behavioral therapies) in human populations. Although the US National Library of Medicine has hosted ClinicalTrials.gov, a publicly accessible registry of clinical trials, since 2000, more recent initiatives are pushing investigators to share patient-level data collected during clinical trials (as opposed to the summary data reported in journal articles) with other investigators. This bibliography includes scholarly articles that present arguments for the societal benefits of clinical trial data sharing; describe emerging systems for preparing, storing, and governing access to clinical trial datasets; and discuss special considerations in the curation of clinical trial data, such as the need to de-identify datasets to protect the privacy of trial participants and to utilize data use agreements to minimize the risks associated with release of data to secondary users. A literature search was performed using PubMed and Google Scholar. Included articles were published in the last 10 years, authored by prominent authors/organizations, and/or appeared in leading medical journals.
Annotations
Bierer, B. E., Li, R., Barnes, M., & Sim, I. (2016). A global, neutral platform for sharing trial data. New England Journal of Medicine, 374(25), 2411-2413. http://dx.doi.org/10.1056/NEJMp1605348
This article provides a high-level description of Vivli, a new, state-of-the-art platform for hosting and facilitating discovery of and access to patient-level data from all types of clinical trials in all areas of health care. Vivli, which is hosted by the Multi-Regional Clinical Trials Center of Brigham and Women’s Hospital and Harvard University, connects existing clinical trial data sharing platforms (which are largely fragmented by funding agency or disease/condition) and directly accepts deposits of clinical trial datasets from researchers who have no other available platform. Vivli promises to provide detailed metadata (manually at first, but automatically via machine-based methods in the future) about trials and trial participants to allow for granular searching. A standard data file structure will allow secondary users to aggregate multiple datasets to perform new analyses. In addition, all datasets will undergo the same anonymization procedures (confirmed by statistical sampling), and the same data use agreements prohibiting secondary users from attempting to re-identify participants or disclosing data to others will apply to all data requests. Based on this article’s description, Vivli appears to be a carefully designed system for storing clinical trial datasets, preparing them for sharing through adequate de-identification of participants, and governing access by secondary users. It is not clear from this article, however, whether Vivli employs sound preservation practices to ensure long-term access to clinical trial data.
Doshi, P., Goodman, S. N., & Ioannidis, J. P. A. (2013). Raw data from clinical trials: Within reach? Trends in Pharmacological Sciences, 34(12), 645-647. http://dx.doi.org/10.1016/j.tips.2013.10.006
Written from the perspective of the pharmaceutical research community, this article describes recent initiatives of pharmaceutical companies, trade associations, and US and European government agencies related to increasing access to participant-level data from clinical trials of pharmaceutical agents. For instance, GlaxoSmithKline (GSK) announced that it would provide access to its clinical trial data with data requests mediated by an independent review panel. Importantly, however, the authors of the article express reservations about several limitations that GSK places on data access, such as not providing access to data from trials of the “off-label” use of drugs and only providing access to participant-level data on a remote server. The authors also critique the US Food and Drug Administration’s “intention to begin releasing patient-level data to third party researchers to further its public health mission” (p. 645) only after removing the “data’s link to a specific product, study, or application” (p. 647). That is, without knowing which trial generated the data or which drug was tested, the data might be of little use to secondary users.
Hrynaszkiewicz, I., & Altman, D. G. (2009). Towards agreement on best practice for publishing raw clinical trial data. Trials, 10(17), 1-5. http://dx.doi.org/10.1186/1745-6215-10-17
This article—the oldest in this bibliography—is an argument for the potential value of sharing patient-level data from clinical trials and the challenges entailed in this endeavor. The biggest challenge identified is the need to ensure the confidentiality of protected health information and the anonymity of trial participants described in the dataset, which is made more difficult by the lack of consensus and appropriate guidance for researchers preparing clinical trial data for broader sharing. This article identifies ethics committees, funding agencies, journal editors and publishers, and clinical trial researchers as the primary stakeholders in this endeavor but also briefly mentions institutional repositories and data archives as adding value to the process of clinical trial data sharing. Thus, although this article does not explicitly speak to data curation, it can be read as suggesting an important role for data curators with expertise in de-identification practices.
Hrynaszkiewicz, I., Norton, M. L., Vickers, A. J., & Altman, D. G. (2010). Preparing raw clinical data for publication: Guidance for journal editors, authors, and peer reviewers. Trials, 11(9). http://dx.doi.org/10.1186/1745-6215-11-9
In this article, the authors provide practical guidance for sharing clinical trial data with other researchers. It provides a definition of a “dataset” (the minimum amount of information needed to reproduce the summary data reported in a journal article or other publication); lays out 28 direct or indirect personal identifiers that may need to be removed from a dataset in consultation with an independent expert; emphasizes the importance of preparing a clean (e.g., no errors or missing values) and well-annotated dataset (e.g., sufficient description of the dataset so that others can understand its content and context); and states the importance of saving files in universally convertible formats (e.g., delimited text files) and employing a technique (e.g., file naming convention or metadata element) that permits file versioning for ongoing trials. Access restrictions and data reuse agreements are briefly mentioned. Although not explicitly speaking to the process of data curation, this article touches upon several steps in the data curation lifecycle and would serve as a good introductory “how to” guide for curators needing to prepare clinical trial datasets for archival and future access.
Lo, B. (2015). Sharing clinical trial data: Maximizing benefits, minimizing risk. Journal of the American Medical Association, 313(8), 793-794. http://dx.doi.org/10.1001/jama.2015.292
One of the more recent articles in this bibliography, the author makes the case that “the issue now is no longer whether to share clinical trial data but instead which specific data to share, when the data should be shared, and with what controls and safeguards” (p. 793)—all issues that are germane to potential curators of clinical trial data. The article describes a report issued by the Institute of Medicine (IOM; now known as the National Academy of Medicine) recommending practices for minimizing the risks of sharing patient-level data and ensuring that such data sharing occurs in a fair manner from the perspective of the primary investigators, secondary data users, and trial participants. These IOM recommendations are similar to the International Committee of Medical Journal Editors’ proposed guidelines for clinical trial data sharing described by Taichman et al. (2016), indicating that multiple organizations are converging on standard practices in this area. The IOM identified “infrastructure, technological, and workforce, and sustainability challenges to achieving this vision”, implying their recognition of the importance of rigorous data curation systems and skilled data curators.
Mello, M. M., Francer, J. K., Wilenzick, M., Teden, P., Bierer, B. E., & Barnes, M. (2013). Preparing for responsible sharing of clinical trial data. New England Journal of Medicine, 369(17), 1651-1658. http://dx.doi.org/10.1056/NEJMhle1309073
This article is a thorough review of recent government, academic, and commercial initiatives to increase the sharing of clinical trial data; the benefits and risks of clinical trial data sharing; and legal and regulatory implications. The article also makes several recommendations for designing and implementing a formal system for clinical trial data sharing. Most interestingly, the article lays outs four potential models for an “ethically sound data-sharing system” (p. 1655): (1) investigators completely release control of the data, making them completely open and accessible to anyone; (2) investigators maintain control of the data, and they or their intermediaries perform analyses requested by other researchers; (3) investigators release control of the data to their funding agency, which controls data access; or (4) investigators release control of the data to a trusted intermediary (that is not the funding agency or employing research institution), which controls data access. Although each model has benefits and drawbacks, it is the fourth model that might best enable sound data curation practices and benefit from skilled data curators.
Rathi, V., Dzara, K., Gross, C. P., Hrynaszkiewicz, I., Joffe, S., Krumholz, H., . . . Ross, J. S. (2012). Sharing of clinical trial data among trialists: A cross sectional survey. British Journal of Medicine, 345, e7570. http://dx.doi.org/10.1136/bmj.e7570
Different from most other articles in this bibliography, this article describes an empirical study of clinical trial researchers’ opinions on and experiences with sharing patient-level clinical trial data. The authors administered a survey eliciting responses from 317 corresponding authors of articles published in the top six leading medical journals. They found that 88% of respondents supported clinical trial data sharing, with most believing that “in principle, sharing de-identified data through a data repository should be mandatory” (p. 3). However, only 18% of respondents indicated that their funder required data sharing, with less than half of these respondents reporting compliance with this requirement. Therefore, there appears to be a large gap between the perceived value of clinical trial data sharing and actual actions taken toward data sharing. The most commonly reported concerns about clinical trial data sharing were that secondary users might misinterpret or improperly analyze the data. These results suggest that clinical trial data curators could facilitate responsible data sharing by helping create detailed and comprehensive documentation of datasets (to minimize chances of misinterpretation) and by requiring and evaluating secondary data analysis plans (to minimize chances of improper analysis).
Ross, J. S., Lehman, R., & Gross, C. P. (2012). The importance of clinical trial data sharing: Toward more open science. Circulation: Cardiovascular Quality and Outcomes, 5, 238-240. http://dx.doi.org/10.1161/CIRCOUTCOMES.112.965798
In addition to discussing the benefits of clinical trial data sharing, this article highlights three initiatives promoting clinical trial data sharing in the area of cardiovascular research. Two of these initiatives are interesting from a data curation perspective. The first interesting initiative is the Biological Specimen and Data Repository Information Coordinating Center (BioLINCC). Still in existence today, this repository now holds datasets generated by 198 clinical studies sponsored by the National Heart, Lung, and Blood Institute. Potential data users can see descriptions of the datasets (i.e., metadata) and download accompanying documentation (e.g., data dictionary, data use guidelines) but must formally request access to the data by completing a form indicating why they are requesting the data, how the data will be analyzed, and what information security practices will be employed. The second interesting initiative is the Yale University Open Data Access (YODA) Project, which is another repository of clinical trial data. Similar to BioLINCC, potential data users can see descriptions of datasets but must request access by providing a research proposal, completing data use agreement training, and meeting other requirements. These initiatives suggest the need for data curators who are highly skilled in describing datasets, developing policies for restricted data access and reuse, and mediating the transfer of data between parties.
Taichman, D. B., Backus, J., Baethge, C., Bauchner, H., de Leeuw, P. W., Drazen, J. M., . . . Wu, S. (2016). Sharing clinical trial data — A proposal from the International Committee of Medical Journal Editors. New England Journal of Medicine, 374(4), 384-386. http://dx.doi.org/10.1056/NEJMe1515172
In this highly influential article, editors of several leading medical journals and a director from the US National Library of Medicine represent the International Committee of Medical Journal Editors in formally announcing proposed requirements for the sharing of clinical trial data as a condition of publication in peer-reviewed medical journals. Specifically, the proposed requirement is that authors of clinical trials must share patient-level data underlying the summary results described in their journal article within 6 months after its publication. Another proposed requirement is that authors must include a description of the data sharing plan provided during the clinical trial registration process (e.g., registration at ClinicalTrials.gov) within the submitted manuscript. This editorial does not specify examples of suitable repositories for depositing clinical trial data, which would have been helpful to others (although different journals might specifically recommend different repositories). Nonetheless, this push by editors of leading medical journals promises to make the sharing of raw clinical trial data a routine part of the research dissemination process.
Terry, S. F., & Terry, P. F. (2011). Power to the people: Participant ownership of clinical trial data. Science Translational Medicine, 3(69), 69cm3. http://dx.doi.org/10.1126/scitranslmed.3001857
This article presents an argument for why clinical trial participants, rather than clinical trial investigators, should have ownership over their data. The main crux of the author’s argument is that clinical trial participants who are empowered to make decisions about whether to share their health will opt to share their data with the goal of helping others and improving societal health. While not touching directly upon the curation of clinical trial data, the possibility that clinical trial participants would be the owners of their data who can decide whether their data are shared or kept confidential has major implications for potential curators of clinical trial data, who may need to take on an added responsibility of ensuring that archived datasets made accessible to others only contain records from individuals who have given consent for data sharing.