Pieces of the Puzzle: Information Retrieval in Arabic Digital Libraries
Shatha Baydoun (LIS 7900: Winter 2017)
Information Retrieval (IR)
Al Dayel, A., & Ykhlef, M. (2013). Arabic users’ attitudes toward web searching using paraphrasing mechanisms. International Research Journal of Computer Science and Information Systems, 2(2), 34-39. Retrieved from [1]
This article explores information retrieval (IR) within the context of Arabic search engines generally and query-paraphrasing specifically. The authors explore the complexity of Arabic—the language is polysemous and has many metamorphic stem words. Also, Arabic has three forms (classical, modern and colloquial) which creates a language that is three times larger than English. Hence, formulating queries and searches within the Arabic lexicon is not an easy task. The methodology used in the article tested three components: search engine usability, search engine effectiveness, and query-paraphrasing. This article was one of few studies that explored web-searching from a user perspective rather than the perspective of language or indexing. The conclusion of the research was that Arabic search engines do not feature automatic query paraphrasing. Furthermore, most Arabic users (like other non-English speakers) rarely use Arabic-specific search engines.
Hmeidi, I., Shehhadeh, H., & Almodawar, A. (2014). Comprehensive study on information retrieval: Arabic document indexing. Research Journal of Science and Technology, 6(2), 79-86. Retrieved from [2]
This article presents a historiography of seminal IR works that have shaped Arabic indexing from 2005-2012. The resounding premise is that there are very few works that deal with “Arabic Text indexing.” As a result, the authors decided to provide a comprehensive survey to further research in the area. The methodology of the paper includes a comparative analysis of 14 seminal works along with the specific indexing variables proposed by their authors. Each article synopsis includes an index type (phrase or term), statistical computation, and model (e.g. Vector). In addition, each synopsis included a data-set description and evaluative measure (e.g. recall and/or precision). Lastly, each description had the experimental results of the various proposed indexing methodologies.
Ménard, E. (2009). Images: Indexing for accessibility in a multi-lingual environment— Challenges and perspectives. The Indexer, 27(2), 70-76. Retrieved from [http://www.ingentaconnect.com/content/index/tiji/2009/00000027/00000002/art00006 ]
This article is older than the others and does not mention Arabic IR specifically. Yet, it is a well-articulated prose that delineates the dilemmas of multi-lingual indexing. Online images lack full-text data therefore, careful considerations must be given to their description. Ménard argues that images can be indexed either thorough controlled or uncontrolled vocabulary. The former includes AAT, ULAN, or TGM and often produce “more consistent terminology.” Yet, controlled vocabulary can be restrictive and impact the accuracy and effectiveness of the search. On the other hand, uncontrolled vocabulary offers “greater variability” in perceptual attributes and structural relationships. The indexer can use both methods to index an image however that was not recommended by Ménard.
Meryem, H., Ouatik, S. A., & Lachkar, A. (2014). A novel method for Arabic multi-word term extraction. International Journal of Database Management Systems, 6(3), 53-67. [doi:10.5121/ijdms.2014.6304]
A complex article that explores the various ways Arabic language can be extracted and the impact of this extraction on information retrieval (IR). Arabic Multiword Terms (AMWTS) are strings of words in text that can be extracted either linguistically or statistically. If these methods are used individually, this will cause problems for the system. For example, the linguistic method relies on part-of-speech (POS) which means that the system cannot separate nouns from adjectives. Rather, the authors propose a system that combines linguistic and statistical algorithms for improving AMWTS. Ultimately, understanding these dynamics allow programmers to design improved search and browse functions.
Tosic, V., & Lazarevic, S. (2010). The role of libraries in the development of cultural tourism with special emphasis on Bibliotheca Alexandrina. UTMS Journal of Economics, 1(2), 107-114. Retrieved from [3]
Created in October 2002, the Bibliotheca Alexandrina is an important cultural and educational center. Designed as a tribute to the Library of Alexandria, it is multi-functionary and serves national, regional, and international users. Additionally, the library is part of the World Digital Library and has undertaken extensive digitization. This has made the library an important center of cultural tourism. The library has tours in various languages together with school trips and special presentations. It also has an Antiquities Museum, a Travel Unit, and multilingual staff members. This article views digital libraries not only as cultural centers that have academic values, rather as economic forces that can improve local and national economies (if done properly).
Arabic Digital Libraries: Case Studies
Bilal, D., & Bachir, I. (2007). Children’s interaction with cross-cultural and multilingual digital libraries. II. Information seeking, success, and affective experience. Information Processing & Management, 43(1), 65-80. [doi:10.1016/j.ipm.2006.05.008]
This seminal work was cited multiple times in the literature. It reports findings on Arabic children’s use of the International Children’s Digital Library (ICDL). The library is international, multicultural, and multilingual. The holistic study builds on earlier research of children’s information behavior (e.g. their use of Yahooligans). It also makes use of qualitative and quantitative observations of children at Bibliotheca Alexandrina. Since the study involves young children (ages 6-10), various forms and paperwork were needed. The children were given three tasks after which they were interviewed in Arabic. After reviewing the article, it became clear why this work was so seminal. First, the subject was unique and combined a study of Arabic speakers who happen to be children. Second, the research methodology was well-constructed and well-articulated. Third, the study related children’s information behavior and the follow-up questions explored their perceptions and feelings.
Boujdad M’kadem, A., & Nieuwenhuysen, P. (2010). Digital access to cultural heritage material: Case of the Moroccan manuscripts. Collection Building, 29(4), 137-141. [doi:10.1108/01604951011088862]
This article is a plea for digitizing old Moroccan manuscripts dating from the 17th century. These manuscripts include a wide array of religious and literary writings that are in private collections. The appeal of the article is that it presents the various challenges to digitization in countries that lack the infrastructure to support it. Here, digitization is viewed as a preservation effort. The authors argue that the manuscripts suffer from dust mites, termites, humidity, and temperature variations—therefore they need to be protected and preserved. In addition, the authors provide a “well-defined” collection development policy that attempts to curtail any impediments to digitization. For example, there is a detailed discussion on Moroccan copyright laws. The methodology of the study included a paper or online questionnaire that was answered by graduate and undergraduate history students from the Department of History at the University of Abdelmalek Essaadi. Interestingly, most of those surveyed still preferred the physical access and the “fruitful human interaction” with private collectors. This is where cultural sensitivities and values can alter the landscape of digital libraries.
Krätli, G. (2016). Between quandary and squander: A brief and biased inquiry into the preservation of West African Arabic manuscripts: The state of the discipline. Book History, 19(1), 399-431.[doi:10.1353/bh.2016.0012]
The detailed article reported the digitization endeavors of Arab-Islamic artifacts at Sub-Sahara Africa. Here, history and library science intersect to provide a rich history of preservation that dates to 1898. One of the interesting things about the article is that it contextualizes digitization within the larger historical events of orientalism, colonialism, and nationalism. For example, the author notes the “pattern of manuscript disappearance and reappearance” that was prevalent during French colonial rule. The author mentions several digital efforts in the region such as OMAR (Oriental Manuscripts Resource), Timbuktu Manuscript Project, and Aluka. These efforts have brought “West African manuscript culture from the colonial era of despoliation and denial to the age of digital discovery, content delivery, and linked data.” Unfortunately, the author concludes that these overarching projects are often too vague in planning, design, and result. Thus, Krätli recommends small and distinctive digitization projects that are more likely to succeed.
Ramadan, E. (2006). Designing and evaluating an integrated digital library system for the National Oil Corporation in Libya. World Digital Libraries, 5(1), 51-73. Retrieved from [4]
This article reviews the digitization project undertaken by the National Oil Cooperation (NOC) in Libya. This special library digitizes research, technical, and production reports. In addition, NOC has a fully-functional bilingual (Arabic and English) interface that supports browsing and searching in both languages. The purpose of the work is to encourage digital libraries in the Arab world. Therefore, the author suggests that there is no need for an extensive infrastructure, rather Arabic libraries can use low-cost and readily-available open source software such as Greenstone. Ramadan’s view on digitization differed from those of Krätli (see article above) and therefore provide a different view of Arabic digitization projects.
Ramadan, E. (2016). Evaluating the usability of Alwaraq’s digital library interface. International Research: Journal of Library and Information Science, 6(4), 1-8. Retrieved from [5]
This brief article reported on a usability study done of an Arabic digital library known as Alwaraq [6]. According to the author, Alwaraq was created in 2000 and includes manuscripts from Arabic and Islamic writers. Like most digital libraries, Alwaraq emphasizes research and academic scholarship. LIS undergraduate students from the University of Benghazi, used a survey-questionnaire with a Likert Scale, to determine the usability of the website. Some of the questions gauged efficiency, effectiveness, visual appearance, and learnability. The conclusion of the study was that Alwaraq had visual shortcomings with font size and type (too small), along with unclear and unreadable text (too blurry). This article provides a specific example of an Arabic digital library that has major design flaws, even though it has been around for a few years.