Metadata Quality Assurance

From SIS Wiki
Jump to: navigation, search

Vasiliki Mitsiopoulos - Metadata Quality Assurance in Health Science Institutional Repositories

Metadata Quality Assurance in Health Science Institutional Repositories
Vasiliki Mitsiopoulos

Definition of Project

This project sought to discover highly useful resources which explore the metadata quality assurance standards for health science related institutional repositories and to discover generalized best practices for metadata for these local institutional systems. This bibliography sought out resources to assess the researched metadata qualities that assist in performing core functions such as discovery, accessibility, provenance, and authentication. Further resources included in the bibliography focus on creating a general model that can be utilized across repositories will also be discussed. Publications within the last twenty years will be evaluated in order to obtain foundational and current information in the metadata practices. Databases such as Library & Information Science Source, Library, information science & technology abstracts (LISTA), and Library & information science abstracts (LISA) were used to find articles. The keywords used to find the articles present in this bibliography were mainly “institutional repository”, “health sciences”, “medical” and a variation of “metadata” or “metadata standards”. These terms were combined with Boolean operators to properly sift for relevant articles. Selection was conducted based on the prevalence of the information for this topic. Some articles were more specific to the health science field, which this bibliography focuses on, but other, more general articles and studies were used, as well, as they show important metadata quality standards for institutional repositories. The general studies and articles chosen are transferable to this more specific discipline.

Annotations

Akers, K. G., Read, K. B., Amos, L., Federer, L. M., Logan, A., & Plutchak, T. S. (2019). Announcing the journal of the medical library Association’s data sharing policy. Journal of the Medical Library Association, 107(4), 468-471. https://jmla.pitt.edu/ojs/jmla/article/view/801

Akers, et al. explain the changes put forth by the Journal of the Medical Library Association (JMLA) requiring manuscript authors to submit their de-identified associated data in a repository along with an “Data Availability Statement” at the time of the article’s submission. This article is an in-depth look at new standards put forth by one of the largest medical associations in the world and shows the importance of making data freely accessible for other researchers. The overarching standards provided by JMLA detail what constitutes as data from a research project and how the data should be organized and made accessible. Data is defined as any underlying materials that contributed to the results of the research. The data then must be shared in an open digital or institutional repository that provides a unique identifier for each data set. In the main text of the article, there should contain a “Data Availability Statement” detailing where the data can be found, as well as a hyperlink to that location. The authors indicate that the Medical Library Association’s data sharing policy can be used as a framework for health science institutional repositories in order to create standardized metadata practices across the discipline. Incorporating “Data Availability Statements” status in the metadata can help organize articles within an institutional repository to showcase research with raw data included, which would facilitate further research with that data.


Antell, K., Foote, J. B., Turner, J., & Shults, B. (2014). Dealing with data: Science librarians’ participation in data management at Association of Research Libraries institutions. College & Research Libraries, 75(4), 557-574. https://crl.acrl.org/index.php/crl/article/view/16374

This article started off on an important note, that in 2011 the National Science Foundation (NSF) started requiring anyone applying for funding to submit a data management plan for their research data. The stipulations to this mandate include the types of data that will be collected, the metadata standards that should be used, and the policies of reuse of the data by other researchers. This lays important groundwork and expectations of researchers to publish their data if they are receiving federal funding. The National Institute of Health (NIH) has also implemented similar guidelines. The scientific community is the largest recipient of grant funding, and these grants will not only promote open data, but promote proper metadata standards and sets the expectations for proper data management within the repositories that hold this data. It is important that this work be done in tandem with university librarians as they are the experts in information organization and metadata standards.


Heidorn, P. B. (2011). The emerging role of libraries in data curation and E-science. Journal of Library Administration, 51(7-8), 662-672. doi:10.1080/01930826.2011.601269

Metadata in the large scale of data curation and E-science need to follow a more structured manner than previously done when researchers worked alone. As various funding agencies are implementing data management mandates for researchers, more research will have data available for other people to access for future study. As mass amounts of data are uploaded into repositories, libraries take on the role of creating consistent metadata standards for better findability. Subject terms need to be mapped and indexed to help users find useful research and data. This article details how libraries can use the Digital Curation Cycle (DCC) method, but need to work alongside the researchers to properly curate their data.


Lee, D. J., & Stvilia, B. (2017). Practices of research data curation in institutional repositories: A qualitative view from repository staff. PloS One, 12(3), https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0173987doi:10.1371/journal.pone.0173987

Open research data is becoming a more important aspect to the scientific community, and practices for research data curation are getting tailored to meet the needs of institutional repositories (IR). This qualitative study investigates how institutions are curating their data specifically within their own repositories, and how that compares to the standard Data Curation Cycle (DCC) model. It was found that IR curation activities were even more detailed than the DCC model, with added steps such as consulting with researchers to validate their metadata to their specific institutional repository. This study helps deepen our understanding of how individual institutions are curating the data they deem useful to keep within their repositories. It is helpful to see the additional work that needs to be done to curate an institutional repository, and it is important to consider this when discussing the work it takes to have quality metadata in an institutional repository.


Mering, M. (2019). Transforming the quality of metadata in institutional repositories. The Serials Librarian, 76(1-4), 79-82. https://www.tandfonline.com/doi/full/10.1080/0361526X.2019.1540270

This article explores the necessary metadata components for faculty works included in the University of Nebraska-Lincoln’s Institutional Repository, which held 99,500 items at the time of this publication. An important component discussed is the consistency of author metadata, and linking IR records to author identifiers, such as ORCID identifiers or emails. Author names can be presented differently depending on the article, so having a unique identifier to tie in all of their work is key to distinguishing authors. As institutional repositories become more common, it is important to revisit the metadata standard that are being used and reassess priorities for better publication preservation for a local institution.


Palavitsinis, N., Manouselis, N., & Sanchez‐Alonso, S. (2014). Metadata quality in digital repositories: Empirical results from the cross‐domain transfer of a quality assurance process. Journal of the Association for Information Science and Technology, 65(6), 1202-1216. doi:10.1002/asi.23045

This study tested the transferability of a previously created Metadata Quality Assurance Certification Process (MQACP), which was originally created to test metadata in an educational context, learning repositories, and is being tested against an institutional repository and a digital cultural repository. The empirical data found that when this quality assurance process is implemented, content providers gain a better understanding of metadata the produced metadata quality increases. The study showed promising results in the transferability of this process and continues a worthwhile conversation on testing metadata quality to set a standard for institutional repositories. Implementing quality assurance processes helps increase metadata quality, which in turn, increases findability and accessibility in institutional repositories. Though it can be a costly process, in funds and in time, there is a benefit to assessing metadata quality and creating a foundation for new research datasets in these settings.


Park, J., & Tosaka, Y. (2010). Metadata creation practices in digital repositories and collections: Schemata, selection criteria, and interoperability. Information Technology and Libraries, 29(3), 104-116. doi:10.6017/ital.v29i3.3136

An important piece of establishing quality metadata practices is evaluating and implementing the most useful metadata schemas. As this study shows, MARC schemas are the most widely used schema, with related AACR2 and LCSH standards enhancing MARC records. Problems arise in metadata interoperability because record providers are hesitant to share their records to other service providers. A key component of quality metadata standards is record transparency. Many institutional repositories hold their own locally created metadata, but if more institutions share their metadata, that will create more robust records which promotes further findability.


Popkin, G. (2019). Data sharing and how it can benefit your scientific career. Nature (London), 569(7756), 445-447. doi:10.1038/d41586-019-01506-x

With inconsistent, or nonexistent, data sharing policies, researchers and scientists lack a foundation for understanding how or why to share their raw data. Popkin discusses various individuals’ attempts at collecting research data and curate them in institutional repositories, but the field lacks consistency in expectations and standards. Open access is driving large quantities of freely accessible data research, and researchers are trying to navigate the plethora of options available to store their raw data and what counts as worthwhile data. Researchers also do not have the expertise in data curation and metadata, so that illustrates the need for partnerships between institutional repository curators and scientific researchers. This article is important as it shows the importance of this conversation, and that these issues are current and pervasive as open access continues to grow.


Read, K., Athens, J., Lamb, I., Nicholson, J., Chin, S., Xu, J., . . . Surkis, A. (2015). Promoting data reuse and collaboration at an academic medical center. International Journal of Digital Curation, 10(1), 260-267. http://www.ijdc.net/article/view/10.1.260/397

At New York University’s (NYU) Department of Population Health within the School of Medicine, a growing need was identified in having a repository that cataloged “large, externally funded datasets” that had descriptive metadata and access instructions for faculty and researcher accessibility. From the beginning, there was also an interest in publishing internal datasets by NYU’s researchers. The creation of Data Catalogue brought on the need to create a unique metadata schema that considered both external and internal datasets. This model is distinctive in its goals and framework, and it could provide a model to other institutions looking to create or augment their institutional repositories with health science datasets. This article also presents the Data Catalogue’s goal to pursue outreach to boost use of the Data Catalogue, as well as curate more internal datasets from the university’s researchers. The marketing aspect of creating an institutional repository is important to consider in order to boost visibility.


Tenopir, C., Allard, S., Frame, M., Birch, B., Baird, L., Sandusky, R., . . . United States Geological Survey. (2015). Research data services in academic libraries: Data intensive roles for the future? Journal of Escience Librarianship, 4(2), e1085. doi:10.7191/jeslib.2015.1085

This study investigated academic libraries’ level of research data services and personal evaluation of their academic library’s response to the growing field. The authors discussed how academic libraries are a very important community member for research data services and one of the areas perpetuating the growth of open data in research. This study was enlightening, because though this field is growing, many libraries that participated lacked the plans to offer these new services. Based on this study, it is important to remember that there is a gap between industry changes and services offered. It is important to note that the libraries currently offering research data services were overwhelmingly libraries that had more than $50 million in external funding compared to ones that had less than $50 million annually. Large changes like creating a data management infrastructure are expensive and intensive undertakings that require outside stakeholders to see the importance of these plans.