Data Integrity in Medical Research

From SIS Wiki
Jump to: navigation, search

Contents

Rebecca Ehrick - Data Integrity in Medical Research: Principles, Policies, and Practices

Data Integrity in Medical Research: Principles, Policies, and Practices

Author: Rebecca Ehrick

Introduction

There are few fields for which data integrity is more important than that of biomedical research, where falsification of data—or even mere carelessness—can literally become a matter of life and death. Despite the general acknowledgement of the importance of data integrity within the biomedical domain, the FDA has issued hundreds of warning letters to manufacturers and research organizations because they have failed to meet regulatory standards.

This bibliography will examine the academic literature on the standards of data integrity in the medical research field. This includes the literature on how they have been maintained, where and why failures have occurred, and new technologies and opportunities for improving data integrity and trust. The primary focus of this bibliography is data integrity as it pertains to the safety and efficacy testing of pharmaceuticals and medical devices, but it also draws on works whose principles apply to the wider field of medical research. Sources were drawn from the academic journals included in the Medline database created by the National Library of Medicine, with a primary focus on literature published within the last five years (2015-2020). Several sources are retrospective in nature and thus encompass a broader scope in time.

Annotations

Dijkers, M. P. (2019). A beginner's guide to data stewardship and data sharing. Spinal Cord, 57(3), 169-182. https:/doi.org/10.1038/s41393-018-0232-6
The author explains the concept of "open data" and its potential benefits, as well as some of the challenges involved in making this data available in a responsible way. He then to then presents a framework for sharing medical research data according to the "FAIR" principles—Findable, Accesible, Interoperable, and Reusable. The paper emphasizes the importance of ensuring that all data has strong metadata that enables it to be found and clear provenance metadata that attributes it to its source, two principles of particular importance for ensuring data integrity. This article also includes a helpful list of twenty-five resources, including repositories, organizations, and software, to facilitate data stewardship and sharing. These resources, as well as the methods and principles examined in this paper, will be useful to a broader audience than the spinal cord injury researchers for which it was originally intended.


Imram, M., Hlavacs, H., Haq, I. U., Jan, B., Khan, F. A., & Ahmad, A. (2017). Provenance based data integrity checking and verification in cloud environments. PLoS One, 12(5), e0177576. https://journals.plos.org/plosone/article?id=10.1371/jounral.pone.0177576
Cloud data storage presents new challenges for ensuring that data has not been altered from its original form. One possible solution to this problem is through the implementation of "data provenance," meaning that the series of actions performed on the original data is recorded in order to track its history and highlight any suspicious behavior. The authors of this paper elaborate on the architecture of their provenance and integrity trackers, and how they compare with existing schemas for maintaining data integrity. They also assess the limitations of the system that they propose.


Khin, N. A., Francis, G., Mulinde, J., Grandinetti, C., Skeete, R., Yu, B., Ayalew, K., Cho, S.-J., Fisher, A., Kleppinger, C., Ayala, R., Bonapace, C., Dasgupta, A., Kronstein, P. D., & Vinter, S. (2020). Data integrity in global clinical trials: discussions from joint US Food and Drug Administration and UK Medicines and Healthcare Products Regulatory Agency Good Clinical Practice Workshop. Clinical Pharmacology and Therapeutics, 108(5), 949–963. https://doi.org/10.1002/cpt.1794
The globalization of clinical trials and the rise of new platforms and methods for sharing data necessitate evolution in the regulations and ethical standards that govern biomedical research. This has specific ramifications where data integrity is concerned, and a joint Good Clinical Practice (GCP) Workshop was convened in 2018 between the US FDA and the UK Medicines and Healthcare Products Regulatory Agency to address this topic. This paper looks at data integrity from the point of view of regulatory agencies, who are tasked with auditing clinical research bodies, and describes effective tools and methods for meeting regulatory requirements such as audit trails, proper study blinding with documentation of randomization, and robust data management procedures that adhere to ALCOA standards (see entry for "Data Integrity: History, Issues, and Remediation of Issues" for more information on ALCOA). Detailed procedures and best practices are laid out that delineate the roles of sponsors, Contract Research Organizations, and other stakeholders throughout the data lifecycle. Good data management practices at all points in the research process are vital for ensuring accuracy, compliance, risk mitigation and ultimately the safety and reliability of the products being evaluated.


Template:Anchor
Mascha, E. J., Vetter, T. R., & Pittett, J.-F. (2017) An appraisal of the Carlisle-Stouffer-Fisher method for assessing study data integrity and fraud. Anesthesia and Analgesia, 125'(4), 1381-1385. https://doi.org/10.1213/ANE.0000000000002415
Automated systems offer promising new ways to discover fraud and other data integrity failures in raw datasets. However, the authors of this paper strike a cautionary note against over-reliance on rule-based algorithms to detect these issues. They examine the approach of Dr. John Carlisle, who applied statistical methods to identify unlikely data in an examination of more than 5000 randomized trials published in anaesthesia and other medical journals. When the trial P-Value calculated by Carlisle was either very high or very low, it was an indicator of potential data error or fraud. However, the authors argue here that the assumptions Carlisle made going into this endeavor are almost certainly untrue for some of the studies flagged as suspicious, which calls the reliability of his approach into question. For instance, study variables are treated as independent, even when they may be correlated, which would have produced an appearance of imbalance. While Carlisle's method may be adequate for an initial screening of data for errors and fraud, it is not sufficient to diagnose these anomalies. The authors suggest the discreet use of multiple methods and tools, such as Benford's Law and the Central Limit Theorem, to comprehensively assess data reliability, quality, and integrity.


Miksa, T., Simms, S., Mietchen, D., & Jones, S. (2019). Ten principles for machine-actionable data management plans. PLoS Computational Biology, 15(3), e1006750. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006750
The ten principles laid out in this article provide a useful framework for digital curators as they develop a Data Management Plan (DMP) to accompany a research proposal. If a DMP is treated as a generic, low-priority document, it will not be useful in supporting data management activities. The authors stress the integral role of data management in today's research environment, where the findability and reusability of data are key, and provide guidance for creating DMPs that will not only meet the needs of the present, but have the flexibility to adapt to changes in the future. These principles take into account the interests of all stakeholders in a project, and recommend using tools like controlled vocabularies and a common data model to maximize accessibility, reusability and transparency of data. These principles would be of greatest utility when implemented on an institutional level.


Navale, V., & McAuliffe, M. (2018) Long-term preservation of biomedical research data. F1000Research, 7, 1353. doi: 10.12688/f1000research.16015.1
Navale and McAuliffe apply the Open Archival Information System (OAIS) model to address the challenges inherent in preserving biomedical research data in the long term. They recommend establishing data quality controls as part of the data management plan to ensure authenticity and reliability. Data integrity is protected through systems that incorporate provenance metadata collected at the point of data generation; this is crucial, because if provenance metadata is not generated at this crucial point, it is otherwise difficult to establish. The authors also emphasize the fact that systems must anticipate the legal, ethical and technical issues that may arise throughout the extended lifecycle of the data, and establish comprehensive plans to address these issues before they arise.


Papadatos, G., Gaulton, A., Hersey, A., & Overington, J.P. (2015) Activity, assay and target data curation and quality in the ChEMBL database. Journal of Computer-Aided Molecular Design, 29(9), 885-896. https://link.springer.com/article/10.1007/s10822-015-9860-5
Publicly available databases provide an ever-growing wealth of quality data to be used and analyzed by researchers in all scientific fields. This paper takes a closer look at one of these databases: ChEMBL, a database of chemical structure and bioactivity extracted from medicinal chemistry literature, but the observations and recommendations are more broadly applicable to the curation of research data. Building on previous work addressing data integrity and curation in ChEMBL, the authors of this paper summarize the most common sources of error in a table (delineating experimental, data extraction, author, and user errors). They then describe the current methods used for data correction, standardization, and annotation within ChEMBL, utilizing both manual and automated processes, and outline plans for streamlining these processes. They also noted that if stricter standards were applied to data provided for ingestion, that would result in less of a burden for curators in the future.


Pezoulas, V. C., Kourou, K. D., Kalatzis, F., Exarchos, T. P., Venetsanopoulou, A., Zampeli, E., Gandolfo, S., Skopouli, F., De Vita, S., Tzioufas, A. G., & Fotiadis, D. I. (2019). Medical data quality assessment: On the development of an automated framework for medical data curation. Computers in Biology and Medicine, 107, 270-283. doi: 10.1016/j.compbiomed.2019.03.001
Ensuring that datasets contain quality, harmonized data is essential for making that data useful in the future. Many papers have suggested metrics for assessing data quality “without…the development and evaluation of a computational framework” (Pezoulas et all, 2019, Introduction). In this case study, an automated framework for assessing data quality has been developed in Python and implemented to evaluate a dataset of clinical data relating to Sjogren's Syndrome. The results confirm that their algorithms were able to detect outliers, and missing, duplicated or invalid data, demonstrating that automated systems can be a useful tool for evaluating and ensuring data quality and validity. However, one should keep in mind the caveat presented in "An appraisal of the Carlisle-Stouffer-Fisher method for assessing study data integrity and fraud" by Mascha et al (2017), and not rely entirely on one method to asses the integrity of a dataset.


Template:Anchor
Rattan, A. K. (2018). Data Integrity: History, Issues, and Remediation of Issues. PDA Journal of Pharmaceutical Science and Technology, 72(2), 105-116. https://doi.org/10.5731/pdajpst.2017.007765
This paper addresses how the principles of data integrity in the medical research domain are codified and regulated in statues by the United States FDA. Digital curators have an important role to play in developing systems that adhere to both the letter and the spirit of these regulations, which will both mitigate risks and lead to better results. The authors of this paper recommend tools and methods of self-auditing to ensure data accuracy and seek to enhance understanding of the impact of data integrity on business risk, safety, and regulatory compliance. The core aspects of data integrity outlined here are represented by the acronym "ALCOA" (later expanded to "ALCOA" plus):
  • Attributable—the provenance of the data can be tracked
  • Legible—records are clear and unambiguous
  • Contemporaneous—actions are documented as soon as possible after they are performed
  • Original—as opposed to a copy—for digital data, this means all appropriate metadata needed to reconstruct the event
  • Accurate—free of errors
  • ALCOA Plus adds "enduring, available, accessible, complete, consistent, credible, and corroborated" to the list of expectations.


Rogers, C. A., Ahearn, J. D., & Bartlett, M. G. (2020). Data Integrity in the Pharmaceutical Industry: Analysis of Inspections and Warning Letters Issued by the Bioresearch Monitoring Program Between Fiscal Years 2007-2018. Therapeutic Innovation & Regulatory Science, 54(5), 1123-1133. doi: 10.1007/s43441-020-00129-z
This review addresses a wide scope of data integrity issues that were cited in FDA inspections and warning letters, not all of which are within the scope of this bibliography. However, it does draw attention to issues that are of interest to the medical research data curator. The second most common type of violation found in the FDA warning letters referenced were documentation issues. Inability to provide requested information and improper recording or storage of data were also cited by the FDA in many cases, emphasizing the importance of proper data curation for regulatory compliance. See "Data Integrity: History, Issues, and Remediation of Issues" (Rattan, 2018) for further examination and guidance with regards to FDA regulations and expectations.


Steinwandter, V., & Herwig, C. (2019). Provable Data Integrity in the Pharmaceutical Industry Based on Version Control Systems and the Blockchain. PDA Journal of Pharmaceutical Science and Technology, 73(4), 373-390. doi: 10.5731/pdajpst.2018.009407
Ensuring that the provenance of research data is recorded and preserved is one of the core principles of data integrity in the medical research field, and presents a particular challenge. One method for tracking the authenticity of this provenance metadata utilizes the same blockchain technology that powers Bitcoin. This paper examines the flow of data in a typical experimental process, highlighting the points in that process where data could be altered or manipulated. By adding the Ethereum blockchain technology to the toolkit used in the curation of medical research data, the authors argue, manipulation can be detected or prevented before it occurs.