Curation and Preservation of Electronic Mail
Jonathan Kirkwood
INF 7920
Digital Curation and Preservation
Curation and Preservation of Electronic Mail
Definition of Project
This project began with a narrowly defined intent: describe the technical details that various scholars have researched and published concerning the curation and preservation of emails. There are numerous articles, chapters, essays, monographs, and reports that discuss this, but only as a part of a larger discussion concerning digital curation and preservation. The publications where the topic deals with preserving email are considerably limited in number. This required a widening of the project, which at the same time revealed the complexity of managing email preservation. The technical aspects are complicated enough, and there are a variety of courses available, but there exist several matters concerning ethics and the law, privacy, culture, funding, staffing, and research. Appreciating all of these is necessary to understanding the curation and preservation of emails.
Annotations
Alberts, I. and Vellino, A. (2016). Assisting the appraisal of e-mail records with automatic classification. Records Management Journal, 26 (3), 293-313. DOI: 10.1108/RMJ-02-2016-0006.
Vellino and Alberts argue that email is now one of the main sources of recordkeeping challenges for organizations. This situation is exacerbated when appraising, managing, and effecting retrieval of emails for use. A greater problem is that the nature of email systems means that employees are engaged in appraisal as much as archivists. However, the authors cited studies demonstrating that employees tended to show far less care regarding appraisal, especially the lack of motivation due to having to manually sort emails in a manner like that of physical filing systems. The authors suggest the solution is automatic classification. They bolstered their thesis with a two-phase study. The first phase involved training classifiers to assist with manual labelling of emails, namely their own. The second phase involved automatic classification of emails. Principle fields of the emails were imported to a single CSV to serve as metadata. Each column in the CSV was derived from a set of lexical and nonlexical appraisal criteria regularly used by participants over the course of their work days. The authors found that second phase assisted already busy employees in managing their emails, which showed promise, though the researchers urged caution despite a seemingly positive outcome. Emails have business value and it is a concept which requires a common understanding among all stakeholders in an organization.
Blouin, F. (2010) “Email as Archives: You have to have it before you worry about it.” J.L. & Arts, 34, 43 – 47.
Francis Blouin expressed concern of how university archives will preserve emails. Such archives are about three things: sources of information, evidence of the reputation of the university, and holds the evidence of what was accomplished by individuals of the university. That a university such as the University of Michigan produces an estimated 12 million emails a day is a cause for much concern. Blouin bluntly admitted that he saw no reason to save all 12 million from any day. If anything, the less saved, the better. Blouin advocated for effective management by separating the ephemeral from the valuable. Any technology designed for preserving email must be designed with that in mind. A more pressing issue for Blouin was not technological but human. Administrators often assume an outlook that it is better to destroy or shred and be safe instead of saving and regretting it later. The pace of technological change has led to a situation where employees are selecting what to preserve in place of archivists. Regardless, preserving emails remains necessary to preserving institutional memory.
Chabikwa, Samuel, Nengomasha, Catherine T. & Sigauke, Delight T. (2016). “Management of email as electronic records in state universities in Zimbabwe: Findings and implications for the National Archives of Zimbabwe.” Journal of the Eastern and Southern African Regional Branch of the International Council on Archives, 35, 14 – 29.
The authors brought to light that while administrative functions of state universities in Zimbabwe rely on email for official business, they are sorely lacking in means to manage and preserve it. The authors attempted to address this by modifying, with emails in mind, the records management framework proposed by Roper and Miller in their 1999 text The management of public sector records: principles and context: 1. Identification of records; 2. Intellectual control of records; 3. Provision of access to records; 4. Physical control of records. The authors proposed: 1. Generation and receipt of business emails in official email accounts; 2. Organization and classification of emails according to subject, importance, and relevance; 3. Maintenance and use of email system; 4. Disposal of emails by server erasure, deletion, or preservation by printing or exporting of data. They found that challenges existed in the inability of differentiate what emails to keep or destroy, lack of legislation or public sector-wide policy, an absence of competencies in records management in general, and a lack of coordination in handling paper and electronic records. They recommended professional training and capacity development for networks, development of policies and their implementation, and technological infrastructure to handle the preservation of emails. Lacking in their recommendations was mention of funding, as if they may have expected that it would amount to something between nothing and little. It seems they have a policy in mind to help colleagues operate in a particular manner.
Cocciolo, A. (2016). “Email as cultural heritage resource: appraisal solutions from an art museum context.” Records Management Journal, 26 (1), 68 – 82. DOI: 10.1108/RMJ-04-2015-0014.
Anthony Cocciolo argued that email has proven itself a mechanism for communicating unformed thoughts between and among individuals and is the natural successor of the written letter. Despite noting the rapid nature in which emails can be produced, Cocciolo further argued that email collections have the potential to capture the emergence of thoughts and decisions to illustrating how and why events developed as they did. Yet the development of email collections faces a multitude of challenges not limited to personal privacy, liability, legal and IT departments wanting to delete all emails, archivists lacking appraisal and selection methodologies, and costs. The key to effective preservation of emails, for Cocciolo, was effective appraisal. He served as a consultant at an unnamed art museum in New England during the year-long duration of a grant awarded to fund the museum’s effort to preserve significant emails. Preserved emails would then be available for public use after 25 years. That said, the goal was to preserve only significant emails, not all of them. While Cocciolo wrote highly of automated classification, the small size of the staff and limited resources of the museum led to the manual appraisal and selection. The most effective means would to exploit the social networks between individuals. Following appraisal and selection, employees used Microsoft Outlook to archive the emails. For this museum, Cocciolo found that the built-in tools in Outlook were the best means to preserving emails.
Kim, S., Sinn, D., & Syn, S. (2011). Personal records on the web: Who’s in charge of archiving, Hotmail or archivists?” Library & Information Science Research, 33, 320 – 330. DOI: https://doi.org/10.1016/j.lisr.2011.02.004.
The authors researched both emails and blogs, noting that most free email and blog services are not designed for long-term preservation. While blogs make up a portion of the article, the authors often wrote about preserving emails and blogs in tandem, frequently mentioning both in the same sentence. It should come as no surprise given that the closing of services means losing data. This can come on short notice. Data can also be lost by changing a business model to a fee-based service or account name changes. The authors noted that while institutions have legal requirements in many instances to maintain correspondence as public records or for audit purposes, individuals are for more vulnerable. The authors examined this problem with four research questions that guided the development of a 36 question survey issued in 2008. The questions were: 1. How does the public use commercially provided email and blog services to keep personal records? 2. What strategies do email and blog users take when archiving their records on the web? 3. What do users want or need to assist them with preserving those personal records? 4. What does the behavior of the public suggest to information professionals? Nearly 350 participants were surveyed. Nearly all used email for personal communication. The authors found that email and blog users shared numerous similarities, notably that they acknowledged the risk of losing content, but complaining that they lacked the tools to preserve it. The study also surveyed 130 information professionals and found no difference between them and the public regarding knowledge of data loss, though they exhibited greater frequency of preserving data.
Owen, C. (2010). “‘Three Little Words’: Is E-Mail Unmanageable?” Archival Issues, 32 (1), 33 – 45. https://www.jstor.org/stable/41102170.
Owen centered his article on the argument that a well-developed record retention schedule is the backbone of a successful records management program. He used the efforts of the Public Records Division (PRD) of the Kentucky Department of Libraries and Archives to implement a change to statewide retention of routine correspondence to better handle the management of email. The change in policy was set to affect some 3,000 offices at the state and the local levels of government. He explicitly referred to the change in wording as minor, though implicitly admitted throughout the article that it would have considerable consequences. The PRD sought to impose a flat two-year retention on all routine correspondence, including email. This may not have been lost on the State Archives and Records Commission (SARC), a seventeen-member body with representatives from the three branches of state government. The intent of the PRD was clear from the start, but SARC and various state agencies remained wary. The problem of email appeared at first to be a technological one: the proliferation of desktop computers and email accounts generated much email. Further examination revealed more of a social problem. State employees appeared at least implicitly aware that they were the creators and destroyers of records, which made some think it best to retain all emails, including those unrelated to state business. Owen concluded that no changes were made and that SARC, the state, and the public at large remain unable to accept that email is not immune from conventional records management.
Prom, C. (2011). “Preserving Email.” Retrieved from Digital Preservation Coalition. DOI http://dx.doi.org/10.7207/twr11-01.
Prom correctly noted that a handful of institutions have developed policies and implemented them to preserve emails. Despite this, he asserted that the essential elements are in place for such a program. He argued that the greatest success can be found in understanding the technical aspects of preserving emails, such as analyzing how current email systems are used, giving priority to long-term access over the minimum retention period for records, and developing documentation and policies. By writing a report, Prom was able to touch multiple areas of email preservation, including more than the technical aspects, such as legal and ethical dilemmas. However, he primarily examined the technical challenges impeding email preservation and how to address them. No mention was challenges was unique. The challenges listed were those found in most articles, though written about in greater length. The same was observed for recommendations in combating those challenges. But in one departure from his peers, he donated some analysis to factors assisting the facilitation of email preservation. Emails mirror the traditional business memo, which means their importance can be sorted out quickly. Messages can be sent across different email services, meaning that more than one can hold them.
Schmidt, L. (2011). “Preserving the H-Net Email Lists: A Case Study in the Trusted Digital Repository Assessment.” The American Archivist, 74 (1), 257 – 296. https://www.jstor.org/stable/23079009.
Schmidt wrote the article partly as a summary of a two-year project to assess and improve the preservation of academic email lists in H-Net: Humanities and Social Sciences Online and partly because up to 2011 there had been no formally documented research into preserving multiple email lists. Her article dealt largely with the technical aspects of working to preserve more than twenty years of academic discourse. And there were plenty of problems to surmount before the project ended. Among them were a lack of backups along with no archival copies to ensure authenticity, security loopholes, little or no documentation, and no file format migration strategies. Addressing these required establishing fixity, developing a succession plan for ensuring the continuity of operations in case of disaster, automatic harvesting of metadata, improved backups with the creation of archival copies, and the creation of documentation. While unmentioned, Schmidt effectively took a full-spectrum effort to address the problems facing her. Numerous individuals were also involved and understood her intent and the importance of addressing the problems she found. Both were critical to achieving success.
Steele, J. (2010). “Preserving history, preserving privacy: E-mail, archival ethics, and the law.” Archival Issues, 32 (2), 99 – 109. https://www.jstor.org/stable/41756681.
Steele argued that managing emails is among the most difficult tasks concerning the management of electronic records. But emails are now an important means of exchanging ideas, making decisions, and creating records. It is natural then that archivists want to preserve them. The proliferation of emails, however, makes it difficult to appraise them individually, a problem compounded by inadequate funding and staffing. Steele further elucidated various technological issues with preserving emails, but his main purpose was to examine the legal and ethical aspects concerning privacy, which are infrequently studied according to him. Breach of privacy is a major concern due to the volume of email correspondence and American cultural expectations of privacy. Steele saw there being three objectives to meet. One, understanding privacy first means looking at the history of privacy in the United States and acknowledging there is no comprehensive law governing the right to it and that advances in technology have clouded legal and ethical interpretations. Two, suggest reasonable obligations of archivists in upholding privacy that should reflect moral principles, though he noted that it can make negotiating privacy tricky. Three, provide privacy audits of email accounts to ascertain whether they contain sensitive information, reflect official business, and have attachments.
Zhang, Jane (2015). “Correspondence as a documentary form, its persistent representation, and email management, preservation, and access.” Records Management Journal, 25 (1), 78 – 95. DOI 10.1108/RMJ-03-2014-0015.
Jane Zhang sought to at once construct a systemic method of thinking about correspondence as a documentary form and examine the evolution of electronic mail regarding its storage and preservation. Above all, she wanted email understood as a documentary form. In the former, she focused on the development of correspondence beginning in Colonial America and succinctly explained that typewriting gradually replace handwriting and that indexes and vertical filing cabinets were the necessary results of voluminous amounts of correspondence. She managed to connect the former point with the latter, if tenuously. She correctly argued through other sources that the email mirrors the traditional business memorandum. From there she developed levels of control for any email system, such as the automatic date and time, the name and address of senders and recipients, the subject (which she considered optional), and the mailbox; by this she meant folders created by users. The folders are a point of concern among archivists, as they can be improperly managed. Effectively preserving emails requires standards and neutral systems that will not interfere with the authenticity of the properties of emails. Zhang also noted that as digital records can be easily altered, authenticity is a significant requirement. She argued that the addresses, names of organizations, dates and times, subjects, confidentiality, and attachments. Zhang provided less details on discoverability and access on the basis that few email archiving projects have reached the point of being publicly available online.