User:Ge7113
Understanding Web Archiving through the Internet Archives, Wayback Machine
Author: Johana Flores
Definition of Project
In trying to better understand what web archiving is, the Internet Archives (IA) Wayback Machine was used as an example of how it works and what are the benefits as well as setback that come with it. The five main topics that are touched upon in the annotations are: What is web archiving? What is the Wayback Machine? Why do we need it? And what might the future of web archiving look like? As this topic has a strong technological component, the articles presented below were published within the past 10 years, so the most up to date information could be included wherever possible. Items were selected based on the information that was included and if it contributed to answering the main questions, and how it was researched, analyzed, and presented.
Annotations
AlNoamany, Y., AlSum, A., Weigle, M. C., & Nelson, M. L. (2014). Who and what links to the internet archive. International Journal on Digital Libraries, 14(3), 101-115. doi:10.1007/s00799-014-0111-5.
The IA Wayback Machine is considered to be the oldest and largest web archive but there seems to be limited research on their user trends. Like how do users end up browsing the Wayback Machine? What are they looking for? Where do they come from? How do web pages link to the IA? This study analyzed human and robot web access logs from February 2012, to better understand these trends. The analysis concluded that the main reason the IA is used is because users cannot find what they are searching for in the live web. They found that around "65% of the requested archived pages no longer exist on the live web" (p. 101). It also discovered that English is the most requested language, closely followed by European languages. This paper was unique in that it covered user trends associated with archived web content. User trends could help give direction to current projects and future projects by clarifying what user needs are, and by prioritizing certain features when creating web archives. User trends can also help explain where the live web content is lacking and why web archiving is needed.
Andersen, H. (2013). A website owner's practical guide to the Wayback Machine. Journal on Telecommunications & High Technology Law, 11(1), 251-276.
Admissibility of screenshots has been judicially approved in most federal jurisdictions. This could be beneficial in presenting evidence in a case trial. With that said, the Wayback Machine only includes screenshots that have been randomly generated by web crawlers. If someone needs a screenshot from a specific point in time, it might not be available because the screenshots are randomly archived. Also, it is important to keep in mind that not all screenshots have full functionality, and this may prove to be problematic in legal proceedings. This article uses old court cases to explain how the law applies to web archiving and the different outcomes that can come with it. It also helps to further understand how web archiving works and the benefits and difficulties that can come with using the Wayback Machine screenshots in legal proceedings.
Belovari, S. (2017). Historians and web archives. Archivaria, 83(1), 59-79.
Over the years the Internet has become an integrated part of our daily lives. Many aspects of life are presented one way or another online. Primary and secondary records are generated at increasing rates. Many of these records do not have any counterparts outside of the Internet. With all this, there is still not enough efforts being made in web history; which is defined by Belovari as the study of the "Web as a site of lived experiences" (p. 60). Although a mass amount of data is being archived not as many modern historians are analyzing it. Belovari believes that some of the reasons might include: "small number of contemporary history programs, the lack of ‘organizational homes'…., lack of training and skill sets" (p. 61-62). There is a lot of focus on web archiving initiatives and future accessibility of records but not enough on what happens to these records once they become available. Who is interpreting them? Or what approach is being taken to interpret them? This article emphasizes the importance of interpreting the internet and lists issues that future web historians might face. Twenty-one well-known web archives were analyzed for this study. The public might simply focus on one particular time, screenshot, or URL but who is going to focus on the bigger picture?
Costa, M., Gomes, D., & Silva, M. J. (2017). The evolution of web archiving. International Journal on Digital Libraries, 18(3), 191-205.
doi:10.1007/s00799-016-0171-9.
Web archiving initiatives started in 1996, since then many organizations have undertaken the responsibility of preserving parts of the Internet. Even then the amount of data that is archived does not compare to the amount of data that is being published. Cost, Gomes, and Silva conducted two surveys in 2010 and 2014, which focused on web archiving initiatives and how they have grown since 1996. This article elaborates on the evolution of web archiving and how it continues to improve and mentions projects taking place around the world. Davis, R. C. (2016). The future of web citation practices. Behavioral & Social Sciences Librarian, 35(3), 128-134. doi:10.1080/01639269.2016.1241122. For nearly two decades, citing web pages in scholarly articles has been the norm. Citing sources is supposed to allow readers to follow up on any studies or claims made and answer any questions that might arise from the readings. Unfortunately, with time these citations have become what Davis terms as ‘reference rot,' "broken links or point to a page that no longer contains the content the author originally cited" (p.128). Davis further examines three citation styles' approaches on citing web sources and how ‘reference rot' has changed some of their guidelines. She also examines how this phenomenon has affected different academic disciplines; from 2005-2012 scholarly articles in the science, technology, and mathematics domain suffered between 70-80% of reference rot (p. 129).
Kahle, B., & Vadillo, A. P. (2015). The internet archive: An interview with Brewster Kahle. 19: Interdisciplinary Studies in the Long Nineteenth Century, 2015(21), 1-15. doi:10.16995/ntn.760.
The manifesto Archiving Internet (1996), written by the founder of IA, Brewster Kahle explains the initial reasoning behind the creation of the IA and its aim to preserve digital history. The interview itself explains a little bit about the technology that goes into web archiving and the future possibilities for IA. This article could be useful to explain the reasons why the IA was initially created and be able to compare its beginnings to where it currently is. Just be mindful that this is all being shared by the creator and employees.
Lueck, T. (2014). Internet Archive: Digital Library of Free Books, Movies, Music, and Wayback Machine/The Internet Archive Companion. American Journalism, 31(2), 299–301.
The focus of the article is the IA mobile application, but it gives a great general overview of the inner workings of the Wayback Machine; which is described as a "web search component of the IA" that allows users to view screenshots of websites dating as early as the mid-'90s (p. 299). It allows access to materials in the public domain including: "web, video, live music, audio, and texts, along with scrolls of the archives headlines and users' latest commentary" (p. 299-300). In 2013, the user base for the IA was estimated to be around 3 million visitors per day. Learning about the different Wayback Machine features can help in understanding what the Wayback Machine is. Perdue, K. (2016). Bringing our internet archive collection back home: A case study from the University of Mary Washington. Code4Lib Journal, (31), 6. From 2010-2014, the University of Mary Washington (UMW) partnered with Lyrasis Mass Digitization Collaborate (MDC) in uploading UMW personal collection to the IA. Collectively the collection, which consists of all university publications including the student newspaper and yearbook, covered almost 100 years of UMW's history. They also created their own interface Eagle Explorer, which helps access to the collection and allows full-text search. The motivation behind this project was because there were constant requests from patrons in wanting to find mentions of a specific person, event, building etc. Before this project, patrons would go through an exhaustive process of consulting an index (if provided) that would only cover certain years. Overall, it has been a successful project with room for further improvements. This article helps fill in some of the data that Szydlowski (2010) article is missing and further emphasizes that there is a need for local web archiving but with it comes to its own challenges. UMW's project would not have been completed without proper funding and organizations willing to collaborate.
Sampath Kumar, B. T., & Prithviraj, K. R. (2015). Bringing life to dead: Role of Wayback machine in retrieving vanished URLs. Journal of Information Science, 41(1), 71-81. doi:10.1177/0165551514552752.
This study collected and analyzed 1700 articles, with 5698 cited URLs, published in three Indian LIS conference proceedings from 2001-2010. With time only 49.91% of the URLs remained active, and 50.09% were found to be non-functioning meaning that as articles age, the percentage of URL citations that function decrease. After visiting the Wayback Machine, 49.91% of active URLs increased to 79.08%; with the help of web archiving, they were able to obtain data that had been considered lost. This study proves that web archiving can be an extremely useful tool in recovering missing URL citations.
Szydlowski N. (2010). Archiving the Web: It's Going to Have to Be a Group Effort. The Serials Librarian, 59(1), 35-39. doi: 10.1080/03615260903534908.
This article complements Costa, Gomes, and Silva (2017) article which highlights some of the challenges that come with web archiving and how there is still parts of the internet that are not being preserved. This article champions for more local initiatives to take place so that local web content can also be preserved. Libraries could face many challenges in the future because of the current web archiving limitations. Like for example government documents for many library collections are of important value but as of 2010, they were not being prioritized in web archiving projects like the Wayback Machine. Some of the information in this article is particularly useful because it focuses on local efforts. But it does lack on concrete data that other studies include as part of their findings. It also does not consider or acknowledge the challenges that local libraries could face if they were to try web archiving initiatives on their