Image Metadata for Genealogy

Making the leap from cultural institutions to common practice: A guide to implementing metadata projects for large-scale digital image collections

cc2.0 BY-ND-NC-SA Carady DeSimone; in partial fulfillment of INF 7920, MSLIS & GCAA, School of Information Science, Wayne State University.

Project Proposal

The literature included in this bibliography is instrumental in creating a grassroots, crowdsourced metadata generation project which I would like to pursue when feasible. The outcome of this project would be twofold. First, the project itself will generate a large influx of metadata-encoded images commonly associated with genealogical studies, improving resources and access for future researchers. Second, the set up and implementation of such a project can provide support, reference, and potential validation in the arena of crowdsourcing in general. Specifically, this body of literature will serve as a resource to help create a crowd-sourced or service-learning friendly metadata workflow within cultural heritage practices. This guide will be beneficial to any students, educators, GLAM staff, or other institutions interested in developing their own crowdsourcing project, particularly those that include elements of metadata and/or digital images.

The metadata workflow and schema design will prepare hobbyist and professional genealogists to both harvest and encode metadata related to images, both as portraits and/or as photographic representations of textual documents. Relevant metadata will be successful in identifying individuals within group photos or handwritten text that is incompatible with OCR. Metadata tagging at the user/creator level is a far more efficient process than recursive metadata additions to previously curated collections. As genealogy is still teetering on the cusp of a formal ‘study,’ it is only through embarking on challenges such as this that we may raise this multidisciplinary community into a true academic field of study.

In the interest of feasibility and manageability, my project will be narrowed in scope to focus on a specific, definable group of individuals. The Filles du Roi, a well-documented group of French immigrants to Canada in the 1600s, would provide not only a manageable and clearly delimited case study, but also include a wealth of documentation, interesting bilingual challenges, and a built-in cohort of eager Citizen Historians and Citizen Archivists: French Canadian (and French Canadian-descended) Genealogists. Future extensions of this project will establish a workflow and schema for metadata projects supported by crowdsourced volunteers and/or Service Learning opportunities.

The annotations below are organized alphabetically by author's surname; however, I have chosen to lead with the article titles for aesthetics.

Annotations

Metadata for Name Disambiguation and Collocation.

Beall, J. (2010). Future Internet,2(1), 1-15. doi:10.3390/fi2010001

This article illustrates the common “homonym” and “synonym” problems within searches, which will both be likely issues within the population of Filles du Roi descendants. Beall notes that these “variant name” issues cause both “unwanted and missed documents” in search results (p. 2). Beall illustrates many common variations of names, even briefly touching on non-Roman characters. A high majority of the population in New France were given a relatively few names (particularly the prefixing of Marie to most women’s names), highly complicating genealogical searches. Additionally, the spelling of French Canadian surnames can vary throughout supporting documents, causing extreme difficulty in maintaining the identity of an individual. Successful creation of a name disambiguation system — either computer assisted via metadata linkage, or via a manual name Thesaurus — will allow for a higher level of precision in searches, leading to fewer items requiring human error checking or recursive searching. Streamlining data this way would also increase the reward side of the effort-to-reward ratio, as volunteers would be able to concentrate more on the “fun” side of tasks (see Ridge, 2013, below).

Reference services to incarcerated people, Part II: Sources and learning outcomes.

Drabinski, E., & Rabina, D. (2015). Reference & User Services Quarterly, 55(2), 123. doi:10.5860/rusq.55n2.123

As part of a year-long project, LIS students engaged in a service learning project in support of NYPL’s incarcerated reference program. Part II of the authors’ report focuses on the methodology and student response to this opportunity. This project clearly illustrates the benefits of service learning for communities, students, and disadvantaged populations. This also supports ethical defense of service or experiential learning, arguing that there is no exploitation occurring because all parties involved benefit from the arrangement. Overall, the authors note a definite success in terms of “meeting the course and program learning objectives while providing a rewarding experience for students" (p. 130). Any instructor or professor desiring to incorporate service learning into their curriculum or syllabi can draw concrete support from this report. Additionally it highlights the success of service learning projects in general.

NOTE: Part II of this project was published as a stand-alone article; however, curious readers are encouraged to also review Part I for more in-depth background.

Providing Metadata for compound digital objects: Strategic planning for an institution’s first use of METS, MODS, and MIX.

Dulock, M., & Cronin, C. (2009). Journal of Library Metadata, 9(3-4), 289-304. doi:10.1080/19386380903405199

The authors’ reported experiences with METS and MIX metadata schemas provide a wonderful framework for future metadata projects, levelling out the learning curve inherent in exploring new technologies. Starting from absolute zero in some cases—to the point of hiring a Computer Science grad student for coding—the team of six were able to build “metadata capture into the automated process of digital imaging,” streamlining a process many institutions are currently struggling with (p. 291). It is also imperative that an academic study have clearly outlined and identified object parameters such as Dulock & Cronin’s - if not for accessibility reasons, then for project manageability! This further reinforces my decision to limit my project to a specific population group for feasibility and practicality.

The pleasure principle: The power of positive affect in information seeking.

Fulton, C. (2009). Aslib Proceedings - New Information Perspectives, 61(3), 245-261. doi:10.1108/00012530910959808

This exploratory study illustrates the motivations reported by amateur genealogists pursuing individual research projects. The author examines aspects of socialization, leisure, and positive affect in regards to information-seeking behaviors. Overall, the article shows that intrinsic, not external, motivating factors are what tend to drive participation in genealogy. Fulton examines the pleasure that becomes inherent in research for these individuals, backing an association with lifelong learning and information as a pursuit of leisure. Furthermore, Fulton’s description of her sample group and the behaviors therein indicates that genealogists may be highly receptive to service learning or crowd sourced opportunities such as the Fille du Roi project.

A suggested taxonomy of genealogy as a multidisciplinary academic research field.

Herskovitz, A. (2012). Journal of Multidisciplinary Research, 4(3), 5-21. Full-Text PDF

This exploratory proposal seeks to establish genealogy with a capital G—to raise the collective study of kinship networks and family history to that of other official -ologies. Herskovitz outlines the current requirements of an “academic discipline”: A focus, unique research methods, significant market demand, professionals in the field, terminology or technical language, “a theory, concepts, and a body of literature” collectively, professional journals, and institutional manifestations (p. 7). Of these eight topics, Herskovitz claims that genealogy lacks only an organized body of knowledge and an official institution, which tend to go hand-in-hand. This article helps to establish Genealogy as a “multidisciplinary academic field” while simultaneously calling for more academic participation (p. 15). Overall, Herskovitz encourages a grassroots effort in support of publications concerning genealogy as an academic discipline.

Multilingual metadata for cultural heritage materials.

Matusiak, K. K., Meng, L., Barczyk, E., & Shih, C. (2015). The Electronic Library, 33(1), 136-151. doi:10.1108/el-08-2013-0141

This case study from the University of Wisconsin-Milwaukee Library explores the benefits and challenges of effectively “providing metadata records that reflect the language of the originating community" (p. 136). The authors conclude that such complicated indexing undertakings are “only possible in small projects…of unique collections of cultural heritage materials” and not feasible for extension to large, diverse collections (p. 149). Additionally, this project illustrates the flexibility and customizability of Dublin Core fields for working with both non-standard objects and non-standard text. This article also provides wonderful context for handling multilingual metadata, as well as the challenges and rationale of such projects. The authors are critical of contemporary machine assisted translation and strongly favor the human alternative – and certainly, even the best AI translation still benefits largely from human revision.

From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing.

Ridge, M. (2013). Curator: The Museum Journal,56(4), 435-450. doi:10.1111/cura.12046

This article provides extremely helpful concepts and definitions for crowdsourcing projects, as well as background information on the concepts of citizen scientists, citizen historians, and citizen archivists. Ridge’s discussion of potential micro-tasks — many of which include metadata — is a very clear outline for successful crowdsourcing projects. Ridge further explores concepts from the International Game Developers Association regarding engagement, recruitment and retention through the concept of scaffolding, where micro-tasks or skills build user confidence through gradual increase of difficulty. Ridge also touches on intrinsic motivations and external rewards, reiterating the personal and community motivations that drive crowdsourced projects (See Fulton, 2009). Additionally, sponsorship or collaboration with such projects may increase community involvement and awareness for the participating institution. Strongly in favor of this form of engagement, Ridge also cites a number of successful examples of crowdsourcing community projects that have accomplished “specific, shared, and substantial goals" (p. 436).

Image embedded metadata in cultural heritage digital collections on the web.

Saleh, E. I. (2018). Library Hi Tech, 36(2), 339-357. doi:10.1108/lht-03-2017-0053

Selah selected over 600 images from four major repositories (Europeana, The Commons, the Word Digital Library, and a selection of Arab National libraries) and found that only “28.5 percent of analyzed images contained metadata" (p. 339). This is a shocking call-to-action for librarians, archivists, and community members to redouble their efforts to promote, enforce, and encourage metadata inclusion with digital images. Saleh statistically concludes a strong leaning towards the usage of the Dublin Core schema (or DC-expanded) elements and schema. This is a second strong indication of the usage of metadata in general and DC specifically. This supports modelling a controlled vocabulary on DC Standards, particularly for image metadata. Encouraging metadata inclusion through user-generated content would assist in preserving much more information than the digital images alone.

Setting up crowd science projects.

Scheliga, K., Friesike, S., Puschmann, C., & Fecher, B. (2016). Public Understanding of Science, 27 (5), 515-534. doi:10.1177/0963662516678514

This investigation into the how and why of citizen-science projects provides legitimacy and guidelines to community-sourced projects. One of the key concepts explored by the authors is the differentiation between crowdsourcing, crowd science, and citizen science. Particularly of note, two out of the 12 organizations that made it to the survey’s final sample were genealogical projects—reinforcing a growing connection between hobbyist family historians and academic or professional archivists. The study concludes with suggestions for implementing successful crowd science projects, such as allowing divergent interests, usage of existing communities (e.g., genealogists), and broadening the variety of tasks available to the crowd (e.g., tagging). In particular, these suggestions support an alignment between genealogy hobbyists, citizen science, and crowd science; as well as providing a cohesive method of analysis for these projects. The authors also acknowledge that projects built on crowd science can range anywhere from “good scientific practice” to outright exploitation — see Van Hooland et al., 2011, (below) for a more specific exploration of this dichotomy.

Between commodification and engagement: On the double-edged impact of user-generated metadata within the cultural heritage sector.

Van Hooland, S., Rodríguez, E. M., & Boydens, I. (2011). Library Trends, 59(4), 707-720. doi:10.1353/lib.2011.0011

This article discusses the ethical and social implications of crowd-sourcing metadata tags specifically within cultural contexts. Adopting a dialectical lens, the authors explore the pros and cons of crowdsourced metadata use within cultural heritage institutions. The authors provide a basis of commentary that can both attack and defend the usage of crowd-sourcing (and citizen science) by cultural heritage institutions. Despite the dichotomy illustrated, the authors posit that crowdsourcing — when practiced ethically — can help clarify user interpretation, encourage multiple perspectives, and increase community engagement. Overall, the authors seem more concerned about “the transformation of cultural goods and services into marketable products” than the process of curating user-generated tags (p. 709). This article serves as a cautionary reminder of the fine lines emerging between engagement/exploitation, user/consumer, and archive/market. Any projects moving forward under the premise of combining user-generated metadata with existing collections should pay keen attention to all potential stakeholders and social implications.

Image Metadata for Genealogy

Contents

Making the leap from cultural institutions to common practice: A guide to implementing metadata projects for large-scale digital image collections

Project Proposal

Annotations

Metadata for Name Disambiguation and Collocation.

Reference services to incarcerated people, Part II: Sources and learning outcomes.

Providing Metadata for compound digital objects: Strategic planning for an institution’s first use of METS, MODS, and MIX.

The pleasure principle: The power of positive affect in information seeking.

A suggested taxonomy of genealogy as a multidisciplinary academic research field.

Multilingual metadata for cultural heritage materials.

From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing.

Image embedded metadata in cultural heritage digital collections on the web.

Setting up crowd science projects.

Between commodification and engagement: On the double-edged impact of user-generated metadata within the cultural heritage sector.

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox