Data Sets

From SIS Wiki
Revision as of 13:14, 14 November 2013 by Dr Beaudoin (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Annotated Bibliography - Policies Surrounding Data Sets by Meg Anthony, Christopher Bonadio, and Adrienne Radzvickas


Arms, C. R. (2000). Keeping memory alive: Practices for perserving digital content at the National Digital Library Program of the Library of Congress. RLG Diginews, 4(3). Retrieved from http://worldcat.org/arcviewer/1/OCC/2007/08/08/0000070519/viewer/file505.html#feature1

Source: RLG diginews via Worldcat
Annotation:
The purpose of this paper is to describe the practices for the American Memory project held by the Library of Congress, but its intention is not to be taken as suggested best practice. It covers ideas surrounding institutional context, and provides a conceptual framework in which to work. It also offers suggestions for storage, metadata, and capture and quality review to support long-term preservation of content in digitized from analog sources.

Bercic, B. (2005). Protection of personal data and copyrighted material on the Web: The cases of Google and Internet Archive. Information & Communications Technology Law, 14(1), 17-24.

Source: Index to Legal Periodicals
Annotation:
This brief article addressed problems with personal privacy and intellectual property rights associated with search engines and Internet archives, specifically Google and the Internet Archive, mainly within the jurisdiction of the European Union. The author is concerned that data collected and stored by indexing databases may infringe upon users’ privacy and ownership rights to their personal information; particularly, the author believes search engines and Internet archives should have to seek permission from the owners of the personal information to index and cache pages rather than do so unless specifically forbidden by the owner of the private data. In other words, responsibility for collecting personal information should also lie with Google and the Internet Archives, not just the owner of the personal information. I like this article because it mentions privacy issues of two popular commercial entities.

Data Archiving and Network Services. (2013). Data seal of approval: Guidelines version 2. Data Archiving and Network Services: The Netherlands. Retrieved from https://assessment.datasealofapproval.org/guidelines_52/pdf/.

Source: Bibliography of Waaijers, L. & van der Graff, M. (2011). Quality of research data, an operational approach. D-Lib Magazine, 17(1/2).
Annotation:
The Data Archiving and Network Services organization provides an online assessment that research data archives can use to have their data quality certified. This document provides 16 guidelines required for the assessment, plus questions that institutions can use to self-assess their compliance. This document provides clear, helpful guidelines for institutions developing data-quality policies, even if they don’t intend to apply for certification.

Data Documentation Initiative Alliance. (2009). DDI 3.1. Retrieved from http://www.ddialliance.org/Specification/DDI-Lifecycle/3.1/.

Source: Bibliography of Waaijers, L. & van der Graff, M. (2011). Quality of research data, an operational approach. D-Lib Magazine, 17(1/2).
Annotation:
The Data Documentation Initiative (DDI) Alliance has developed an XML metadata standard “for describing data from the social, behavioral, and economic sciences”. It contains descriptive metadata for the following components: the metadata document, the study, the files, data items, and related materials. This document contains the specifications of the metadata standard. The Resources page of the Web site also contains a “Getting Started” document that provides a recommended process for using the standard. This document is useful for institutions that want to provide more extensive metadata then the DataCite Metadata Schema.

Digital Curation Centre & Australian National Data Service. (2010). How to appraise & select research data for curation. Retrieved from http://www.dcc.ac.uk/sites/default/files/documents/How%20to%20Appraise%20and%20Select%20Research%20Data.pdf.

Source: Digital Curation Center. (2013). How-to guides. In the Digital Curation Center Web site. Retrieved from http://www.dcc.ac.uk/resources/how-guides.
Annotation:
This document provides criteria to use to develop a research data selection policy. It includes five types of criteria to consider: “relevance to mission,” “scientific or historical value,” “uniqueness,” “potential for redistribution,” “non-replicability,” “economic case,” and “full documentation”. For each type of criteria, it includes points to consider in appraisal. This document would be useful for institutions developing research data selection policies.

Ferullo, D. (2004). Major copyright issues in academic libraries: Legal implications of a digital environment. Journal of Library Administration, 40(1-2), 23-40.

Source: LISA
Annotation:
This paper is an excellent overview of copyright issues affecting libraries in general and digital libraries in particular. The article would be a good place to start for any librarian looking for information on copyright law, a type of intellectual property, especially within the context of digit copyright. First, the author discusses the Digital Millennium Copyright Act (DMCA), passed in 1998, particularly to prohibit the manufacture of devices that can circumvent protect technology, as well as noteworthy cases affected by the DMCA. Next, she defined the Sonny Bono Copyright Extension Act, which extended the term of copyright from 50 to 70 years after the author’s death. Ferullo mentions the Technology, Education, and Copyright Harmonization Act (TEACH), which applies mainly to distance education. Additionally, the author distinguished between three types of copyright law exemptions: the education exemption, the fair use exemption, and an exemption for libraries.

Fujinaga, I. &. Riley, J. (2003). Recommended best practices for digital image capture of musical scores. OCLC Systems and Services, 62-69. Retrieved from http://search.proquest.com.proxy.lib.wayne.edu/docview/57607747?accountid=14925

Source: LISA
Annotation:
As best practices, standards and guidelines have emerged for printed and textual materials, consideration of the complexities and issues in digitizing specialized resources like musical scores, have not been as well developed. Based on guidelines from institutions like NARA (National Archives and Records Administration) and the NINCH (National Initiative for a Networked Cultural Heritage), this article presents some best practices for the digitization of musical scores. Suggestions for defining the purpose of an imaging project, capture methods and file format recommendations are discussed in this article.

Garofalo, D. A. (2011). Tips from the trenches. Journal of Electronic Resources Librarianship, 389-391. doi:http://dx.doi.org/10.1080/1941126X.2011.627810

Source: LISA
Annotation:
Libraries are finding themselves needing to develop new methods of acquisitions, policies for obtaining digital content. This brief article offers resources in assisting with these challenges, discussing workflow, management of electronic sources, and selection guidelines.

Green, A. Macdonald, S., & Rice, R. (2009). Policy-making for research data in repositories: A guide. Retrieved from http://www.disc-uk.org/docs/guide.pdf.

Source: Bibliography in Harvey, R. (2010). Deciding Which Data to Keep. Digital Curation. New York: Neal-Schuman Publishers.
Annotation:
This document provides detailed, comprehensive points to consider in developing policies for archiving research data. It covers the entire lifecycle for archived data sets. It also contains references to other useful documents. This document would be useful to anyone developing a detailed policy document for a research data archive.

Hahn, K. (2009). Achieving the full potential of repository deposit policies. Research Library Issues: A Bimonthly Report from ARL, CNI, and SPARC(263), 24-32. Retrieved from http://search.proquest.com.proxy.lib.wayne.edu/docview/57722543?accountid=14925

Source: LISA
Annotation:
As the number of independent repositories grows, so too does the need for greater sharing, harvesting, and coordination between them. Based on a January, 2009 meeting this article discusses key questions about funder-imposed deposit requirements, policies and copyright licensing. Focusing on PubMed as a model, strategies and plans of action for the future of repository services are examined.

International DOI Foundation. (2012). The DOI handbook. Retrieved from http://www.doi.org/hb.html.

Source: Bibliography of Paskin, N. (2005). Digital Object Identifiers for scientific data. Data Science Journal, 4(1), 1-9. Retrieved from http://www.doi.org/topics/050210CODATAarticleDSJ.pdf.
Annotation:
The data set archival literature consistently recommends the use of Digital Object Identifier (DOI) names to identify research data sets. (For example, see the description of the Identifier element in the DataCite Metadata Schema as a DOI name.) This document provides a well-written, but very detailed and comprehensive description of the DOI system. Chapter 2, Numbering, is especially useful because it provides a detailed description about how a DOI name is constructed. It is recommended to read Paskin (2005) first as an overview and this document second for more detailed information.

Joint, N. (2006). Legal deposit and collection development in a digital world. Library Review, 55(8), 468-473. Retrieved from doi:http://dx.doi.org/10.1108/00242530610689310</blockquote>

Source: LISA
Annotation:
In a comparative overview on the challenges of legal deposit in national collections, this study describes management issues between print and digital legal deposits and their preservation. The research for this article calls for a deeper investigation on how to implement collection development strategies and preserve web-specific content.
<p>Laakso, M. &. Bjork, B.(2013). Delayed open access: An overlooked high-impact category of openly available scientific literature. Journal of the American Society for Information Science and Technology, 64(7), 1323-1329. Retrieved from http://search.proquest.com.proxy.lib.wayne.edu/docview/1429844559?accountid=14925

Source: LISA
Annotation:
This quantitative study identifies peer-reviewed journals during 2011 and collects data regarding embargo lengths and citation rates in order to understand the effect of delayed open access (OA) on the availability of cited articles. This study found that though delayed open access might seem like an oxymoron, journals with delayed OA have higher citation rates than immediate OA journals. The results strongly suggest that open access journals start indexing delayed journals and embargo periods in order to have more complete article-level coverage for future study of OA journals.

Lynch, C. (2004). Preserving digital documents: Choices, approaches and standards. Law Library Journal, 96(4), 609-617. Retrieved from http://search.proquest.com.proxy.lib.wayne.edu/docview/57617904?accountid=14925

Source: LISA
Annotation:
There are many guidelines and standards for preserving and digitizing objects, but things get more complex when dealing with born-digital objects. More digitally born research is becoming interactive and saving these authoritative versions of documents becomes difficult to reproduce. Using migration and emulation are two strategies for preservation, but as this paper points out, there is no single correct strategy. Ultimately it will take careful consideration to make the best choices in the interest of the objects and the repositories needs.

Maher, William.(2010). Symposium: Digital archives: Navigating the legal shoals: If only we could reach the shoals: Barriers to archives digitization. Columbia Journal of Law and the Arts, 34(1), 5-14.

Source: LegalTrac
Annotation:
This article was written in a very “chatty” style using the metaphor of a ship preparing to embark on a long journey, probably a very rough and “rocky” one at that. I like this article because the author first defines the term, “archive: “published and unpublished—currently a very problematic term of limited utility—documentary material created on behalf of both organizations and by individuals for personal reasons.” Also, Maher outlines the responsibilities of archivists in general and how current intellectual property rights issues, the “rocky shoals,” relate to digital archiving in particular with reference to some specific examples.

McDermott, A.(2012). Copyright: Regulation out of line with our digital reality? <i<Information Technology and Libraries, 31</i>(1), 7-20.

Source: Library Literature & Information Science Full Text
Annotation:
The author provides a brief overview of U.S. Copyright Law, especially as it relates to librarians and digital library services, and lists four of the biggest “obstacles” to librarians and library patrons: The Digital Millennium Copyright Act; the Sonny Bono Copyright Term Extension Act; automatic copyright registration, meaning any “creative works” are copyrighted at the moment of creation, including family vacation photos on a hard drive; and orphan works. The author suggests that copyright laws be taught to budding professional librarians in MLIS programs and more librarians should get involved in the Creative Commons movement. I chose this article because it addresses copyright law and digital rights management in the context of library and information sciences.

McMillen, David. (2004). Privacy, confidentiality, and data sharing: Issue and distinctions. Government Information Quarterly, 21, 359-382.

Source: Library & Information Sciences
Annotation:
The content of this article could really apply equally to analog and digital collections, or databases, used by government agencies. However, I chose this article for those interested in understanding the importance of privacy and confidentiality policies surrounding personal data sets. The author provides a brief history of privacy legislation between 1970 and 2000, including a discussion of the Privacy Act of 1974 and the Financial Services Modernization Act, which, among other things, states that businesses cannot disclose nonpublic information to third party vendors without customers’ consent. Next, he reviews data-sharing legislation from the 105th Congress through the 107th Congress. Overall, the author suggests that respondents, or the public, should be well-informed of how any information collected on them will be used by any government agency.

Mutula, S. Ethics and trust in digital scholarship. The Electronic Library, 29(2), 261-276.

Source: LISA
Annotation:
This “conceptual paper” addresses issues of trust and ethics in the digital research environment, including e-learning environments. Specifically, the article discusses what characteristics are required for a particular community of users to trust (i.e. trust them enough to use them) digital or e-learning environments. Mutula reviews various cross disciplinary models of trust, such as the “user acceptance model,” “integrated trust building generic model,” and models dealing with “service quality” and “user satisfaction.” This article might be useful for anyone developing policies for a digital repository; particularly archivists can see what feature or features must be designed in their digital archives or any e-environment to build trust among that user community.

Nath, S., Sridhara, B., Joshi, C, Kumer, P. (2008). Intellectual property rights: Issues for creation of institutional repositories. DESIDOC: Journal of Library and Information Technology, 28(5), 49-55.

Source: Library Literature & Information Science Full Text
Annotation:
The authors suggest librarians and research institutions should embrace the Open Access (OA) movement to combat strict licenses and high prices set by publishers. One “road” to Open Access is self-archiving, also known as the Green Road, such as depositing articles in Institutional Repositories (IRs)—defined as “web-based databases for scholarly material.” IRs are one way an institution can “showcase” its research achievements; additionally, IRs capture research material that may not be accommodated in traditional journals, such as datasets and video/audio files. Once it is decided to use an IR, the hosting institution must decide on content guidelines, metadata schemas, appropriate repository software (e.g. Fedora, DSpace, etc.), and must make sure there is no infringement of intellectual property rights of content submitted to the IR. I included this article because research institutions are choosing to use IRs and Creative Commons licensing to disseminate research more frequently.

National Institutes of Health. (2003). NIH data sharing policy and implementation guidance. National Institutes of Health: Bethesda, Maryland. Retrieved from http://grants1.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm.

Source: Bibliography of Buckland, M. (2011). Data management as bibliography. Bulletin of the American Society for Information Science and Technology, 37(6), 34-37.
Annotation:
In the United States, the National Institutes of Health (NIH) requirements for research data sharing for those seeking an NIH grant, along with requirements by the National Science Foundation (NSF), started the discussion about archiving and sharing research data. This document defines the requirements for a data-sharing plan and provides example descriptions of plans. It also contains a link to a Data Sharing Workbook that provides detailed examples of data-sharing plans developed by various institutions. This document is a good starting point for those beginning the discussion of how and why to share data.

National Science Foundation. (n.d.). Dissemination and sharing of research results. National Science Foundation: Arlington, Virginia. Retrieved from http://www.nsf.gov/bfa/dias/policy/dmp.jsp.

Source: Bibliography of Buckland, M. (2011). Data management as bibliography. Bulletin of the American Society for Information Science and Technology, 37(6), 34-37.
Annotation:
In the United States, the National Science Foundation (NSF) requirements for research data sharing for those seeking an NSF grant, along with requirements by the National Institutes of Health (NIH), started the discussion about archiving and sharing research data. This document defines the requirements for a data-sharing plan. It also provides links to detailed guidance documents for specific scientific areas, such as biological sciences or geosciences. This document is a good starting point for those beginning the discussion of how and why to share data.

Open Archives Initiative. (2008). Object reuse and exchange. Retrieved from http://www.openarchives.org/ore/.

Source: Bibliography of Hourdé, J. (2012). Advancing the practice of data citation: A to-do list. Bulletin of the American Society for Information Science and Technology, 38(5), 20-22.
Annotation:
This set of documents describes an alternative to using a DOI to represent multiple sets of data. It explains how to define structural metadata that identifies an aggregate resource and its component resources as a resource map. It provides instructions and examples how to construct resource maps in RDF/XML, RDFa, and Atom XML. This method is especially useful for research data because the data for one project often resides in multiple data sets.

Oppenheim, C. (2011). Legal issues for information professionals X*: Legal issues associated with cloud computing. Business Information Review, 28(1), 25-29.

Source: LISA
Annotation:
The author discusses legal issues associated with cloud computing. Surveys the author reviewed suggest key concerns of users of cloud computing services are corruption of their data and data protection, including privacy and copyright issues. The author mentions these issues in addition to problems related to data processing and transfer. If a cloud computing service is located in France and the user is a United States citizen, under what jurisdiction does the service fall: the European Union or the United States? Furthermore, when the author reviewed some cloud computing standard contracts, he found most suppliers do not accept liability for the loss of data and, overall, that users are really not told what happens to their data. The author provides a series of questions users should ask cloud computing suppliers. I chose this article because cloud computing services are an increasingly popular means of storing and accessing data.

Paskin, N. (2005). Digital Object Identifiers for scientific data. Data Science Journal, 4(1), 1-9. Retrieved from http://www.doi.org/topics/050210CODATAarticleDSJ.pdf.

Source: Bibliography of Mayernik, M. (2012). Data citation initiatives and issues. Bulletin of the American Society for Information Science and Technology, 38(5), 23-28.
Annotation:
This article provides an overview of the Digital Object Identifier (DOI) system, what a DOI name is, and how a DOI name can serve as an identifier for one or more data sets. It then discusses how two scientific projects are using DOI names as identifiers for data sets. This article provides a succinct, if sometimes confusing, overview of the use of DOI names as identifiers for data sets. However, it does not make general recommendations about how to construct DOI names for scientific data. It is recommended to read this article first as an overview and then read International DOI Foundation (2012) second for more detailed information.

Sale, A. (2006). Comparison of content policies for institutional repositories in Australia. First Monday, 11(4). Retrieved from http://search.proquest.com.proxy.lib.wayne.edu/docview/57631636?accountid=14925

Source: LISA
Annotation:
This study offers a comparative analysis of 7 Australian universities that have institutional repositories for authored research. Those universities with a higher repository output also had requirement policies for deposit. Universities that only apply a voluntary policy on research articles without author support proved consistent with international data and had lower output. The resulting evidence from this study encourages the Australian Department of Education, Science, and Training (DEST) to require that all postprint research be deposited in an institutional repository.

Sims, R. M. (2003). Obligations and aftercare: The depositor and service standard agreements. Journal of the Society of Archivists, 24(2), 215-221.Retrieved from http://search.proquest.com.proxy.lib.wayne.edu/docview/57609510?accountid=14925

Source: LISA
Annotation:
Friendly negotiations with depositors, and potential depositors, demands close attention to detail and objectivity. While not all depositors may be easy to get along with, maintenance and development of these relationships is vital for any archive. The skills developed in serving our depositors is mainly learned on the job from senior colleagues, but this article encourages mindfulness about how we interact with depositors and explain service agreements.

Stamatoplos, A. (2005). Digital archiving in the pharmaceutical industry. Information Management Journal, 39(4), 54-56,59. Retrieved from http://search.proquest.com.proxy.lib.wayne.edu/docview/57631402?accountid=14925

Source: LISA
Annotation:
Pharmaceutical companies retain a large amount of digital records. From payroll to research, it is vital for them to preserve their digital data. Thinking about accessibility, selectivity, fidelity and compliance, this article provides a checklist for what every records manager should be considering when preserving their digital assets.

Starr, J., Ashton, J., Barton, A., Elliott, J., Jacquemot-Perbal, M., Karjalainen, M., ... Ziedorn, F. (2013). DataCite metadata schema for the publication and citation of research data. Retrieved from http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdf.

Source: Bibliography of Hourdé, J. (2012). Advancing the practice of data citation: A to-do list. Bulletin of the American Society for Information Science and Technology, 38(5), 20-22.
Annotation:
The DataCite metadata schema was designed to support citations of data in scholarly literature, but also oontains expanded elements to use for description and discovery of research data sets. This document defines the Metadata Schema elements. It provides a clear description of which elements are mandatory or optional, a definition of each element, the number of times that each element can be used in one record, allowed values, and examples. It also provides a list of controlled values for some of the elements. This document provides an excellent starting point for determining what metadata elements to use for a data set.

Wu, H., Chou, C., Ke, H., and Wang, M. (2010). College students’ misunderstandings about copyright laws for digital library resources. The Electronic Library, 28(2), 197-209.

Source: Library Literature & Information Science Full Text
Annotation:
A research article designed to investigate students’ misconceptions of copyright laws and licensing agreements related to web content, particularly digitized scholarly publications provided by universities. First, the authors performed a focus-group interview with four librarians followed by a questionnaire given to a total of 109 students: undergraduates, postgraduates, and doctoral students. Based on the surveys, students had the following misunderstands: “digital resources should be shared,” even if the person you are sharing the data with is not a student at the same institution; “all educational use was fair use,” downloading any digital material at any time and in any size was okay as long as you were paying tuition; and, many assumed, once a digital resource is downloaded, it becomes the student’s property, meaning they can transmit, reproduce, and distribute the resource anytime, anywhere. The authors suggest that laws and agreements relating to digital materials should be taught to the students to encourage more informed decisions. I chose this article because it may guide librarians as they develop policies to help resolve some of these problems with end-users’ usage of digital content. Students are not necessarily intentionally downloading resources illegally.

Zimmerman, D. (2007). Living without copyright in a digital world. Albany Law Review, 70, 1375-1397.

Source: LegalTrac
Annotation:
The author of this article makes that case that current, or traditional, copyright laws do not serve the needs of cyberspace resources. She identifies four types “general strategies” for using the Internet to publish or disseminate digital content: Naysayers, Locksmiths, Subverters, and Explorers. Naysayers are authors who stay away from distributing their creations over the Internet mainly because they do believe current copyright laws protect digital resources very well. Locksmiths do disseminate some of their content online but do so with caution usually by publishing them with some sort of digital rights management technology to maintain some copyright control over their resources. Subverters do use digital copyright protection but usually in the form of Creative Commons licenses or a GNU General Public License. Explorers are people who rely less on official copyright laws. They disseminate resources their “own way” without help from “formal legal regime set out in the Copyright Act”—although not illegally. The author does not state which “strategy” is best, but does hope the Internet continues to provide a place for experimentation with these various approaches.