Are you the publisher? Claim or contact us about this channel


Embed this content in your HTML

Search

Report adult content:

click to rate:

Account: (login)

More Channels


Channel Catalog


Channel Description:

Reading Traditional Sources with Nontraditional Methods
    0 0


    NB: the paper has been presented @ Digital Humanities and Islamic & Middle Eastern Studies, Brown University, Providence, RI (October 24-25, 2013); the video recording of the presentation is available @ www.islamichumanities.org > Day One (timestamp of the presentation 2:48:00; Q&A: 3:51:30); the entire paper is also available as a PDF


    All models are false, but some are useful
    George P. Box

    Why Models?

    The advent of digital humanities has brought the notion of “big data” into the purview of humanistic inquiry. Humanists now have access to huge corpora that open research possibilities that were unthinkable a decade or two ago. However, working with corpora requires a rather different approach that is more characteristic of sciences than humanities. Namely, one has to be transparent and explicit with regard to how data are extracted and how they are analyzed. Text-mining techniques rely on explicit algorithms because they help tracing mistakes, correcting them and, ultimately, improving results.1 Analytical procedures for studying extracted data rest on explicit algorithms for the same reason. As a way of constructing algorithms, modeling is part and parcel of developing complex computational procedures.

    Working with big data also requires a different kind of modeling. Opting for the breadth of data we have to give up the richness of details. Close reading—to which humanists are most accustomed—becomes impossible.2 Working with big data one cannot maintain the nuanced complexity of details that became the hallmark of close reading as an approach. Instead of relying on complex textual evidence and reading between the lines one has to work with relatively simple textual markers—essentially, words or simple phrases—that are treated as indicators of large trends. Yet, it is through such analysis that we can look into long-term and large-scale processes that will always remain beyond the scope of close reading. The literary historian Franco Moretti dubbed such an approach “distant reading, ” explaining distance” not as an obstacle, but a specific form of knowledge.3 With emphasis on fewer elements that allows us to get a sharper sense of their overall interconnection, we can distinguish shapes, relations, structures. Most importantly, we can trace small changes over long periods of time.

    Modeling is an important part of this approach. With models we simplify reality down to a limited number of factors4 through the analysis of which we hope to get insights into complex processes.5 This simplification is the reason why all models are false. Yet, models are a valuable and powerful tool. They pave the way to improving our understanding of the world. Unlike theories, models are experimental and driven by data. Good models offer invaluable glimpses into the subjects of our inquiry.6 With them we can explore, explain, project. With them we can get a big picture. That is why some models are useful.

    What follows is an attempt to model Islamic élites based on the data from al-Dhahabī’s (d. 748/1348 CE) Taʾrīkh al-islām in order to explore major social transformations that the Muslim community underwent in the course of almost seven centuries of its history. The main types of data used in the model are dates, toponyms,7 linguistic formulae (or, wording patterns), synsets (lists of words that point to a specific concept or entity), and, most importantly, “descriptive names” (sing. nisba).

    The detailed discussion of main assumptions regarding these types of data as well as the discussion of such general issues relevant to the study of Arabic biographical collections can be found elsewhere.8 Here it is most important to dwell on our assumptions regarding “descriptive names” that are regarded by some scholars as the most valuable kind of data that literary sources offer to the social historian of the Islamic world, and by others as highly problematic as such. The major problem with nisbas is that it is not always clear what they stand for. For example, if an individual is described in a biographical collection as ṣaffār, does this actually mean that he was involved in “copper smithing”? When our subject is just one particular individual, it is not so difficult to establish the more or less exact meaning of this descriptive name by cross-examining biographies of this individual in other biographical collections. This is particularly easy now when dozens of electronic texts of biographical collections are just few mouse-clicks away. However, such an approach becomes problematic when this rather time-consuming procedure has to be repeated for dozens of individuals. The approach becomes particularly difficult if our goal is to study some biographical collection in its entirety, since Arabic biographical collections often contain thousands of biographies and most biographies offer multiple descriptive names for the same individual. After a certain threshold it becomes impossible to apply this approach at all. Our source, Taʾrīkh al-islām, is well beyond this threshold. In the analysis that will follow, we will deal with the dataset of almost 70, 000 nisbas (with about 700 unique ones) that represent about 26, 000 individuals over the period of 41-700/1301 CE. Working with such a dataset one cannot possibly know the exact meaning of each and every nisba. At the same time we do not have any solid foundation to argue that descriptive names are to be treated in a particular manner, or to be discarded altogether. Yet, such a dataset is too unique an opportunity for research to ignore simply because we are not entirely sure what all these data mean. This is where modeling offers an optimal solution: we need to start with assumptions and be upfront about them. In what follows, descriptive names will be treated at their face value, if only because this is the most logical starting point.9

    The Source: al-Dhahabī’s Taʾrīkh al-islām

    Taʾrīkh al-islām is the largest Arabic biographical collection that includes over 30, 000 biographies and covers almost seven centuries of Islamic history. The current dataset includes information on slightly over 29, 000 individuals (the first three volumes of Taʾrīkh al-islām are structured differently from the rest of the collection and cannot be studied with the same computational method). Figure 1 shows cumulative biographical graphs and curves based on this data set. Biographies are grouped into 20 lunar year periods (quantities of biographies for each period are shown at the bottom of the Biographical graph). The graph is transformed into a curve that smooths out the noise of data, emphasizing larger trends (Smoothed Biographical Curve). Finally, the main curve is the Adjusted Biographical Curve, which is shifted 30 years back in time to reflect “the years of floruit” of the biographees from Taʾrīkh al-islām.

    Figure 1. Cumulative Biographical Curve. The row of numbers shows the quantities of biographies per 20 lunar year periods, beginning with 41-60 AH/662-681 CE and up to 680-700 AH/1282-1301 CE.

    The curve can be split into several periods, each beginning at a point that marks a noticeable diversion of the curve. The number of biographies keeps on growing quite rapidly until c. 160/778 CE, which marks the slowing of growth. During c. 270-470 AH/884-1078 CE there is a steady decline. After c. 470/1078 CE the curve starts recovering, reaching its highest point around c. 570/1175 CE, after which it keeps on growing, but slows its pace by the end of the period—with the second peak being somewhere after 700/1301 CE. For convenience, many of the graphs that will follow will include the scaled-down cumulative curve and color-coded periods.

    Modeling Society

    The individuals whose lives are described in biographical collections were not ordinary people. They were integrated into the life of society to a noticeable degree—somewhere on the scale from noteworthy to extraordinary. Almost every biographical note contains some information on a sphere of life to which its protagonist contributed—and “descriptive names” is the most manageable indicator of this.

    Major studies that use “descriptive names” for analytical purposes split them into categories. Cohen’s classic study10 concentrates primarily on “secular occupations” during the first four centuries of Islamic history. He offered a major division of occupational nisbas (textiles, foods, ornaments/perfumes, paper/books, leather/metals/wood/clay, miscellaneous trades, general merchants, bankers/middlemen) and supplied an extensive appendix with explanations for about 400 nisbas and relevant linguistic formulae. Unfortunately, the nisbas in Cohen’s appendix are not explicitly categorized and—since any categorization involves pushing the boundaries, especially in instances that stubbornly resist classification—the exact scheme remains somewhat unclear.

    Petry’s scheme is built on biographical data from Mamlūk Egypt (1258-1517 CE). Petry divided his subjects into six major, often overlapping occupational groups: executive and military professions, bureaucratic (secretarial-financial) professions, legal professions, artisan and commercial professions, scholarly and educational professions, and religious functionaries. Although explicit classification is not given in the “Glossary of Occupational Terms, ” numerous tables provide enough information to form a rather clear idea about the specifics of each category in Petry’s classification scheme.11

    Shatzmiller approached this issue from the much wider perspective of labor in general. Her scheme covers a much wider variety of occupational names and splits the entire society into three major sectors—extractive, manufacturing, services—with each sector having its overlapping subcategories. Schatzmiller offers an explicit categorization of each and every descriptive name.12

    As is the case with any scheme, all three examples are designed to serve specific purposes. Although immensely helpful, none of them are suitable for the purposes of broader analysis: unlike the above-mentioned schemes, the scheme needed here must take into account all meaningful descriptors, not only those that can be classified as “occupations.” In other words, it must consider anything that would allow discerning all potentially identifiable groups so that their evolution could be traced. Some of these descriptors do not pose significant problems, others are so complex that even presenting them as ideal types might be highly problematic.

    The list of “descriptive names” from Taʾrīkh al-islām is based on frequencies and for the moment I consider nisbas that are used to qualify at least ten individuals (slightly over 700 unique nisbas, with their total running up to almost 70,000 instances). My list of descriptive names overlaps only partially with those of Cohen, Petry and Shatzmiller. Figure 2 shows how the categories of “descriptive names” from Taʾrīkh al-islām are interconnected from the individual perspective.

    Figures 2, 4, 3 (Order: Left-to-Right, Top-to-Bottom):Figure 2. Interconnectedness of Descriptive Names from the Individual Perspective. Figure 4. Interconnectedness of Descriptive Names from the Social Perspective. Shifting circles and dashed lines denote the intricate interconnectedness of the three layers of name categories. Figure 3. Hierarchical Connections of the Middle Layer.

    The innermost layer of categories includes tribal, toponymic, ethnic and physical descriptions. These are descriptors over which individuals have the least control—in a sense that no one chooses into which tribe to be born, where to be born, what ethnic group to belong to, and what physical peculiarities to have or suffer from. To a certain degree these description are also acquirable—in the early period being a Muslim meant being affiliated with an Arab tribe; individuals were constantly moving around the Islamic world, changing their toponymic affiliations; physical peculiarities could have resulted from life experience. However, these are only probable—and thus secondary—cases that would usually be piled up on top of primary, “by-birth” descriptions. The first three categories—tribal, toponymic, ethnic—also tend to overlap.

    The middle layer groups “descriptive names” in terms of acquirable qualities—trade, knowledge, positition and status. These are not categories that rest on the same level and their connections are better represented in an hierarchical manner (Figure 3). The main gateways to élites were trades (or “secular occupations”) and knowledge[s]. However, practicing some trade alone was almost never enough: biographical collections rarely—if ever—include individuals who were involved exclusively in some specific “secular occupation.” In order to climb up the social ladder a practitioner of any trade had to start converting his economic capital into social—this was most commonly done through acquiring religious knowledge. Knowledge—as specialized training in a specific area that would set an individual aside from the masses—opened ways for acquiring positions and status[es]; it could also allow one to practice trade on a new level, thus improving individual’s status.

    The outermost layer represents the major sectors to which a person could belong in pre-modern Islamic society: religious, administrative, military, and “civilian.” The term “civilian” is problematic, and is used here essentially as a negative blanket category that encompasses everything that does not clearly belong to the first three sectors. Descriptive names often cross boundaries among these categories and most individuals do not clearly belong to one specific sector, but rather balance among them.

    For our purposes it will be more efficient to invert this scheme so that “descriptive names” are presented from the social perspective (Figure 4). Now, each category contributes to the composition of Islamic society, and every “descriptive name” can be seen as a social role. These roles are likely to receive a centripetal charge from individuals who attempt to expand their influence on society at large; how close they get to the center—i.e., how much social influence they can exercise—would depend on the success of particular individuals and/or historical circumstances that might be favorable to particular groups. Social influence here is understood broadly as a pressure that forces someone to do something that s/he otherwise would not have done; at this point I do not make a distinction between physical threats and social pressures. Clearly, the sword of an amīr, “military commander, ” and the word of a shaykh, “religious authority, ” are different in their nature, but both may have equally serious societal consequences.

    Figure 5.Nisba Classification Examples (a).

    Figures 5&6 should provide a visual clue to how these overlapping categories are used in the classification scheme. On Figure 5: amīr, “governor, commander, ” and sulṭān, “sultan, ” both belong to the military sector of society. Amīr can be seen primarily as a position—in a sense that there is somebody above who granted this position to a given individual; arguably, this position provides one with a relatively high status. Sulṭān is the apex of the military hierarchy and thus is primarily seen as status with significant influence over all other sectors. Kātib, “scribe, ” and wazīr, “vizier, prime minister, ” belong to the administrative sector, where the former is a position with potential for social influence, while the latter is the apex of the administrative hierarchy, which gives one significant resources to influence society at large—hence, it is also status.13 Somewhat an equivalent to amīr, raʾīs, “chief, director, ” is a denomination of high status in either the civilian, the religious or the administrative sector (also position in the latter). Ṭabīb, “physician, ” stands for special training—knowledge—within the civilian sector, which is also likely to fall into the categories of trade and position, especially after hospitals (sing. bimaristān) become a constant element of the Muslim cityscape.14Qaṭṭān, “producer or seller of cotton, ” and qawwās, “bow-maker, ” are both secular occupations—trades—and thus belong to the civilian sector, although the latter—if bows are produced for war-making purposes—may cross into the military one.

    Figure 6.Nisba Classification Examples (b).

    On Figure 6: shaykh, literally “elder, ” and imām, “leader, ” are the markers of the highest religious status, although in the later period imām also refers to a religious position of “prayer leader” that was only marginally influential in social terms. Faqīh refers to knowledge of Islamic law, whereas social influence is exerted primarily through other roles, such as qāḍī, “judge, ” which is always a position—or muftī, “juristconsult, ” which turns into a position in the later period (not graphed). Ḥāfiẓ denotes knowledge of Prophetic tradition and high achievement (status) within this area of religious expertise. Muḥtasib, “market inspector, ” is an administrative position with strong religious underpinnings. Last on the list are khaṭīb, “Friday preacher, ” and wāʿiẓ, “public preacher.” Both belong to the religious sector, but while the former is always a position, the latter refers to a specific field of religious knowledge that tends to become a position only during the later period.

    Individuals in the Islamic biographical dictionaries usually wear many turbans and are qualified with more than one “descriptive name.” Using the same method, each individual can be represented as a unique constellation of trades, knowledge[s], positions and status[es] that are fitted into the diagram of the four major sectors. Pushing this approach even further, we may try to evaluate how the composition of Islamic élites—and, possibly, society at large—changed over time, although conventional graphs may be more efficient for this task.

    Looking into Major Sectors

    Introducing the categories of sectors—military, administrative, religious and civilian—I hope to use them as markers of change within the composition of Islamic élites. Society would remain healthier when more social groups are represented in the élites, since a more diverse population will be participating in the [re]negotiation of the rules of the game. This is what the share and the diversity of the civilian sector—with a number of trades, crafts and knowledge[s]—is meant to represent.

    Figure 7. Major Sectors of Islamic Society (as represented in Taʾrīkh al-islām)

    Figure 7 shows the cumulative curves of all four sectors. Although this is still work in progress and algorithms for determining the administrative and military sectors still need adjustment, the curves do agree with the major trends that we expect to be confirmed by quantitative analysis.

    The religious sector keeps on growing throughout the period. Occasional fluctuations notwithstanding, it hits the 60% mark by the end of the period. One would expect this number to be higher, but a significant number of individuals participated in the transmission of knowledge without specializing in specific fields of religious learning and thus did not not earn relevant nisbas. This, of course, may result from irregularities in naming practices or the lack of verbal patterns in my synsets.

    The civilian sector is at its highest during 300-400 AH/913-1010 CE, when it reaches a 30% share. By the end of the period it goes down to 20%. The number of individuals involved in trades and crafts is about 24-25% at its highest point around 400/1010 CE and goes down to 13-14% by the end of the period.

    The administrative and military sectors are not as significant in terms of numbers, but the representatives of these sectors are in better positions to make the most immediate and most striking impact on society at large. Both sectors keep on growing, although while the growth of the administrative sector is constant, albeit rather slow, the growth of the military sector is quite remarkable, especially after 500/1107 CE. Overall, the share of the military sector could have been reaching up to 10% during the later periods, which is very significant considering that at some earlier periods this sector is lacking altogether. The administrative sector may have hit the mark of about 8% during the later periods.

    Major Social Transformations

    Figure 8. Individuals with Tribal and Toponymic Nisbas in Taʾrīkh al-islām.

    De-tribalization is one of the most striking processes that the onomastic data show. Islamic society starts as a tribal society with up to 85% of individuals in the earliest periods qualified through tribal affiliations. As the Islamic community grows and spreads over the Middle East and North Africa, the number of individuals with tribal identities rapidly goes down (Figure 8) and by about 350/962 CE only 20-25% of the individuals in the Taʾrīkh al-islām have tribal affiliations. From this point on—perhaps even earlier—tribal affiliations persevere in different capacities: some as dynastic (most prominently, the nisba al-Umawī that spikes again after 350/962 CE in Andalusia), but in most cases as status markers.

    Figure 9. Individuals with nisba al-Anṣārī in Taʾrīkh al-islām. Although al-Anṣār, “The Helpers [of the Prophet], ” are not exactly a tribe, this group, being a product of the tribal society of Arabia, in many ways functioned as such.

    Such nisbas as al-Anṣārī (Figure 9) and al-Qurashī (Figure 10) make quite a noticeable comeback. The numbers of al-Anṣārīs (this nisba is particularly frequent in Andalusia as well) begin to grow quite rapidly after 350/962 CE and the number of al-Qurashīs practically skyrockets right after 500/1107 CE. However, even though their absolute numbers are much higher in the later periods, their percentages never reach their early peaks: the highest peak of al-Anṣārīs in the earliest periods is 18.32%, while the highest one in the later periods is only 6.53%; with al-Qurashīs these numbers are 8.42% and 3.31%. Some other tribal nisbas are re-claimed as well, but the overall number of individuals with names that associate them with Arab tribes remains rather low, only briefly going above the 30% mark.

    Figure 10. Individuals with nisba al-Qurashī in Taʾrīkh al-islām.

    Most tribal nisbas display rather distinctive orientations toward the East or the West of the Islamic world. “Late bloomers” are most often oriented toward the West (Figure 11). For example, such nisbas as al-Qaysī (208) and al-Lakhmī (183) feature most prominently in Andalusia (84 al-Qaysīs and 83 al-Lakhmīs); al-Tujībī (127)—in Andalusia (57) and Egypt (46); al-Makhzumī (182)—in Egypt (33);15 al-Saʿdī (191)—in Egypt (50) and Syria (25). But again, the percentages of “late bloomers” never reach those of the earlier periods.

    Figure 11. Western Orientation of Some Tribal “Late bloomers.” NB: Each map has its own scale.

    The change in tribal identities can also be seen through the numbers of unique tribal nisbas per period (Figure 12). In general, they display a similar trend. At its highest the number of unique tribal nisbas fluctuates at around 115 during the period 100-200 AH/719-816 CE. It drops to about 60 by 500/1107 CE and then grows back to about 80—most likely through the re-appropriation of old tribal nisbas that are now used as status markers as well as through the introduction of Turkic and Kurdish tribal identities—but by the end of the main period this number goes down to the 60-70 range.

    Figure 12. Unique tribal nisbas in Taʾrīkh al-islām.

    Militarization. Onomastic data from Taʾrīkh al-islām allows us to take a closer look at the process characterized by Hodgson as “perhaps the most distinctive feature of the Middle Islamic periods.”16 The absolute numbers on Figure 13 (left) show that the military sector of élites begins to grow rapidly after 500/1107 CE—the numbers of amīrs included in the Taʾrīkh al-islām are staggering.17 Geographically, this spike of militarization is clearly visible in Iraq, the Jazīra, Egypt, but in Syria more than anywhere.

    The relative numbers in Figure 13 (right) allow for a more detailed glimpse into how the military were treated by the learned class who composed biographical collections that became sources of al-Dhahabī’s “History.” And the percentages tell a somewhat different story. Interestingly, the turning points of the military curve coincide with those of the cumulative biographical curve. The military curve, however, has three clearly visible sections, or periods. The first section, the early period up until 270/884 CE, shows the decline of the military in Islamic society. This process of de-militarization went on hand-in-hand with de-tribalization, during which the diversity of the Islamic community grew, the ethos changed and swords and horses were exchanged for pens and donkeys. 270/884 CE is the first peak of the cumulative biographical curve: the highest percentage of the learned and the lowest percentage of the military in the Taʾrīkh al-islām.

    Figure 13. The Military Sector in Taʾrīkh al-islām.

    During the middle period of 270-570 AH/884-1175 CE, when the cumulative biographical curve takes a dive and then, after 470/1078 CE, begins to recover, the share of the military in Taʾrīkh al-islām grows slowly. This can be marked as the beginning of [re-]militarization of Islamic élites. Unlike in the early period, however, now the amīrs are not Arab[ian] warriors, but Turkic military commanders.

    After 570/1175 CE—when the cumulative curve recovers and continues growing further—the percentage of military commanders in the élites begins to grow as rapidly as their absolute numbers. This third period shows a successful integration of the military into the élites and the their numbers strongly suggest that religious scholars take even minor commanders seriously.

    Military commanders do a lot to make a place for themselves in the dense social space of the Islamic society: as their biographies show, they build madrasas, hospitals (māristān) and establish other waqf institutions. More and more often they participate in the transmission of knowledge, which scholars report.

    The military—the amīrs themselves and members of their families18—are not the only ones building madrasas and, judging by the frequencies of their mentions, their establishments are not the most prominent. However, they definitely compensate for this in numbers: there are significantly more endowments established by the military than by members of other groups.19 Figure 14 shows the curves of the most frequently mentioned madrasas in Taʾrīkh al-islām. The vizieral al-Niẓāmiyyas and the caliphal al-Mustanṣiriyya feature more prominently. However, their curves strongly suggest that their prime time is over, while “military” madrasas—al-Ẓāhiriyya, al-Amīniyya, al-Nāṣiriyya, al-Nūriyya, al-ʿĀdiliyya and al-Qaymariyya and others—are on the rise.

    Figure 14. Mentions of Most Prominent Madrasas.

    The “Fulān al-dīn” honorifics that in the earlier periods were reserved for religious scholars become very common among the military, while the old pattern of “Fulān al-dawla” practically disappears (see Figure 15).20 It is not entirely clear whether these names are given to the military by religious scholars or if they are self-claimed (most likely both), but the fact that the military are listed under these honorifics in biographical collections implies that at the very least religious scholars endorsed them.

    Figure 15. Patterns of Military Honorific Names: Fulān al-dawla, the most common pattern in the middle period, gets replaced by Fulān al-dīn pattern in the later period.

    Frequencies of such words as khalīfa/amīr al-muʾminīn, sulṭān and amīr in biographies show that the 4th/10th century was a the period (Figure 16) when scholarly attention started shifting from caliphs to sulṭāns and amīrs who were gaining more power and more social presence. This shift in frequencies also neatly marks the end of the period which Hodgson characterized as the High Caliphal Period (in his chronology: c. 692-945 CE),21 and the beginning of the Earlier Middle Islamic Period (in his chronology: c. 945-1258 CE): the era of sulṭāns and amīrs.

    Figure 16. Frequencies of khalīfa, sulṭān, amīr.

    De-civilianization. As was noted above, the share of the civilian sector noticeably decreases after 400/1010 CE. The diversity of crafts and trades within the civilian sector (Figure 17) reaches its highest point around 300/913 CE, when 85 different trades and crafts are represented.22 After 300/913 CE the diversity goes down, getting to the 60s range by the end of the period.

    Looking closer into trades and crafts, it can be pointed that several sectors are clearly distinguishable:23 textiles (1, 495), foods (799), metalwork (331), “chemistry” (349),24 clothes (306), finances (278), paper/books (253), brokerage (231), jewelry (218), and sundry services (170).

    Figure 17. Diversity of Trades and Crafts: Numbers of unique nisbas referring to trades in crafts by 20 lunar year periods.

    All sectors peak sometime between 300/913 CE and 500/1107 CE, but after that they show steady decline—even in those rare cases when absolute numbers remain quite significant, their percentages unmistakably go down. Practically all individual nisbas show the same trend. Merchants (sing. tājir, 294; Figure 18) constitute the only group that shows a different trend and their numbers actually grow by the end of the period. This is, however, only because this is a blanket category that encompasses all the above listed “industries, ” without emphasizing any specific one in particular. Figure 19 shows the cumulative trend of involvement of religious scholars in crafts and trades. The curve based on absolute numbers (left) shows that numbers of scholars—who were either directly involved in specific crafts and trades or came from families that made their fortune in those areas—remained rather high until 600/1204 CE; relative numbers (right) show that the steady downward trend in this sector begins as early as 440/1049 CE—about three decades before the cumulative biographical curve (470/1078 CE) starts recovering.

    Figure 19. The Growth and Decline of Crafts and Trades.

    By the end of the period the emphasis in identities shifts, and while “secular occupations” are still not uncommon among the learned,25 they are definitely no longer the main focus of biographers, who instead pay more attention to positions and family connections (see section on Professionalization below).

    The geographical distribution of these professions is most puzzling. Essentially, all “industries” display the same pattern: the larger the region, the larger the presence of individuals involved in specific “industries.” Iraq always comes first, followed by Iran (representation by sectors varies slightly, but northeastern Iran usually has highest numbers), then Syria and Egypt. Such a geographical distribution of “industries” suggests that occupational nisbas were used as necessary specifiers to distinguish among individuals in large communities.26 This issue might be resolved by adding local biographical collections to the corpus and experimenting with data grouping until some distinctive patterns can be discerned. Data from non-literary sources will be crucial for advancing this inquiry, which requires undivided attention.

    Whether this decline of the civilian sector is a result of the actual withdrawal of the learned from trades and crafts, or, the loss of awareness of this part of their identity, the general effect on the development of the religious sector would still be the same: the loss of connections with broader population. It is not that religious scholars stopped maintaining connections with populace at large, but they gradually turned into a self-reproducing class whose members were primarily concerned about their own group interests.

    Figure 18. The Growth of Merchants.

    Professionalization & institutionalization of the learned class are another two processes that take place during the period covered in Taʾrīkh al-islām. These processes have been discussed at length in academic literature,27 although in most cases the emphasis is on institutionalization.28

    Here “professionalization” is understood as the growth of complexity of religious learning that leads to its branching into specific disciplines, mastering which eventually requires full-time commitment. Professionalization implies the development of a community of specialists who maintain qualifying standards and ensure demarcation from the non-qualified; ideally, mechanisms of monetary and status compensation for professional services should develop during this process.

    Figure 20. Growth of Religious Specializations: Numbers of unique nisbas referring to religious specializations by 20 lunar year periods.

    If we agree on recognizing the process of branching of the religious learning into specific disciplines as an indicator of professionalization, we may look at the growth of religious specializations as indicated through “descriptive names.” Figure 20 shows that the process of branching reaches its highest point during 300-350 AH/913-962 CE, after which the number of specializations remains on the same level and fluctuates only slightly.

    Although completely devoid of both buzzwords, Melchert’s study is perhaps the most valuable insight into the process of professionalization.29 In his book on the formation of the Sunnī legal schools (madhhab), Melchert offered three major criteria: the recognition of the chief scholar (raʾīs), commentaries (taʿliqa) on the summaries of legal teachings (mukhtaṣar), as a proof of one’s qualification, and a more or less regulated process of transmission of legal knowledge, through which the achievement of required qualification is ensured. Chronologically, Melchert placed this process for the Shāfiʿīs, Ḥanbalīs and Ḥanafīs in Baghdad of the late 9thh—early 10thh centuries.30 Keeping in mind this coincidence of Melchert’s close reading of legal ṭabaqāt and my distant reading of Taʾrīkh al-islām, we may—at least tentatively—consider 300/913 CE to be a turning point in the process of professionalization.

    Figure 21. References to Waqf Institutions in Biographies.

    Data from the Taʾrīkh al-islām shows that professionalization of religious knowledge (around 300/913 CE) is not directly related to scholars’ abandoning their gainful occupations in the civilian sectors, as this process will start only around 430/1039 CE. However, professionalization failed to bring about one very important thing, namely more paid positions for the learned. This must have forced men of learning into difficult position where they had to maintain a delicate, but uncomfortable balance between keeping up with higher standards of religious learning and earning living. The financial difficulties that professionalization imposed on the life of a scholar may have become quite a discouraging factor for the young who were considering career paths. Keeping in mind that the decline of the main curve begins c. 270/884 CE—i.e. roughly around the time when the number of religious specializations reaches its highest point—it is tempting to consider that professionalization has something to do with this decline. After all, a full-time commitment to study religious sciences leaves one no time to earn a living through gainful occupations in the civilian sector. Charging money for teaching religious subjects was considered illicit, and there are hardly any indications that the number of positions for religious specialists grew to compensate for this unfortunate development. To succeed in such conditions, one had to be either extremely resolute or come from a wealthy family to afford the career of a scholar. And since both of these are in limited availability in any society, this could explain the decline in numbers of biographies.

    The introduction and spread of waqf institutions is considered a turning point in the institutionalization of the learned. The salaried positions of these institutions offered a solution to the complication of professionalization. Frequencies of references to waqf institutions in biographies (Figure 21) show that they—most importantly the madrasas—become a noteworthy detail of biographies soon after 400/1010 CE, about 100 lunar years after the turning point in professionalization, and a very important one after 500/1107 CE.31

    However, by offering salaried positions, the waqf institutions also reconfigured the structure of the learned class, which in the long run had a very negative effect. In his study of medieval Damascus,32 Chamberlain convincingly argued that salaried positions (manāṣib) became one of the major object of contention among the learned who were now concerned about winning and holding as many of these positions as it was possible. One of their strategies was to ensure that the positions stayed within a family—household—which led to the formation of the dynasties of religious scholars and, in the long run, the transformation of the religious class into a rather closed social stratum, to which the word “clergy” became more and more applicable as time went on.

    Figure 22. References to Relatives. The graph on the left shows the major categories of relatives, while the on on the right shows the same data combined into one graph.

    As the data from the Taʾrīkh al-islām indicate (Figure 22), the role of family connections unmistakably increases after 400/1010 CE. The tribal nature of early Islamic society explains the high frequency of references to close relatives in the early periods. However, references to parents are most frequent—largely to fathers33—which is understandable, considering the importance of lineage through the male line within tribal society. But again, the curve of references goes down steadily between 120/739 CE and 380/991 CE, mirroring the curve of tribal identities that also goes down while the number of biographies keeps on growing. After 380/991 CE references to family members practically skyrocket, and even increase in pace slightly around 500/1107 CE. Unlike in the early period, references to most members of the immediate family become very common: parents (the word “parent, ” wālid[a], become particularly common), siblings (brothers and sisters—akhū, ukht), children (sons and daughters—ibnu-hu, bintu-hu, etc.), and, to a lesser extent, spouses (husbands and wives—zawj[a]). The same trend can be seen in the references to uncles, aunts, grandparents and grandchildren. These shifts—not just the growth of frequencies, but also the growth of varieties of familial references—may be interpreted as a shift of scholarly attention from the lineage to the household.

    If we accept these rates of frequencies as an indicator of the formation of households, than it appears that scholarly households begin growing earlier than waqf institutions. The growth of scholarly families thus may have been caused by professionalization and then boosted by institutionalization.

    Concluding Remark

    The presented model is exploratory. It is rather simple, but it is transparent. Explicitly described models can be discussed, compared, modified, and applied to new sources. With models we can stop futile discussions about the meaning and reliability of certain data and start exploring Islamic history experimentally. By developing and testing multiple complex models we can eventually arrive to a better understanding of both our sources and processes that they describe. With models we can compare multiple sources, even evaluate entire genres. Right now, when scholars of Islam are entering the domain of digital humanities, there is a dire need for transparency of our methods—and modeling appears to be the most optimal option—especially if we venture to study the entire digital corpus of classical Arabic sources, which at the moment may have already exceeded 800 million words.

    Cited Works

    Berkey (1992):
    Jonathan P. Berkey. The transmission of knowledge in Medieval Cairo: a social history of Islamic education. Princeton University Press, Princeton, N.J., 1992.

    Bulliet (1979):
    Richard W. Bulliet. Conversion to Islam in the medieval period: an essay in quantitative history. Harvard University Press, Cambridge, 1979.

    Chamberlain (1994):
    Michael Chamberlain. Knowledge and social practice in medieval Damascus, 1190-1350. Cambridge University Press, Cambridge ; New York, 1994.

    Cohen (1970):
    Hayyim J. Cohen. The economic background and the secular occupations of muslim jurisprudents and traditionists in the classical period of islam: (until the middle of the eleventh century). Journal of the Economic and Social History of the Orient, 13: 16-61, 1970.

    Ephrat (2000):
    Daphna Ephrat. A learned society in a period of transition: the Sunni ʿUlamaʾ of eleventh century Baghdad. State University of New York Press, Albany, 2000.

    Gilbert (1980):
    Joan E. Gilbert. Institutionalization of muslim scholarship and professionalization of the ʿUlamāʾ in medieval damascus. Studia Islamica, (52): 105-134, January 1980. ISSN 0585-5292.

    Hodgson (1974):
    Marshall G. S. Hodgson. The venture of Islam: conscience and history in a world civilization. Vol. 2. The expansion of Islam in the middle periods, volume 2. University of Chicago Press, Chicago, 1974.

    Humphreys (1989):
    R. Stephen Humphreys. Politics and architectural patronage in ayyubid damascus. In Clifford Edmund. Bosworth, editor, The Islamic world from classical to modern times: essays in honor of Bernard Lewis, pages 151-174. Darwin Press, Princeton, N.J., 1989.

    Humphreys (1994):
    R. Stephen Humphreys. Women as patrons of religious architecture in ayyubid damascus. Muqarnas, 11: 35-54, January 1994. ISSN 0732-2992.

    Jockers (2013):
    Matthew L. Jockers. Macroanalysis: Digital Methods and Literary History. University of Illinois Press, 1st edition edition, April 2013.

    Makdisi (1981):
    George Makdisi. The rise of the colleges: institutions of learning in Islam and the West. Edinburgh University Press, Edinburgh, 1981.

    Melchert (1997):
    Christopher Melchert. The formation of the Sunni schools of law, 9th-10th centuries C.E. Brill, Leiden ; New York, 1997.

    Moretti (2007):
    Franco Moretti. Graphs, Maps, Trees: Abstract Models for Literary History. Verso, 2007.

    Moretti (2013):
    Franco Moretti. Distant Reading. Verso, 1 edition, June 2013.

    Morris (2013):
    Ian Morris. The measure of civilization: how social development decides the fate of nations. Princeton University Press, Princeton, 2013.

    Petry (1981):
    Carl F. Petry. The civilian elite of Cairo in the later Middle Ages. Princeton University Press, Princeton, N.J., 1981.

    Romanov (2013):
    Maxim G. Romanov. Computational Reading of Arabic Biographical collections with Special reference to Preaching (661-1300 CE). Ph.D., University of Michigan, Ann Arbor, MI, 2013.

    Shatzmiller (1994):
    Maya Shatzmiller. Labour in the medieval Islamic world. E.J. Brill, Leiden [The Netherlands], 1994.

    Wasserstein (2013):
    David J. Wasserstein. Where have all the converts gone? difficulties in the study of conversion to islam in al-andalus. Al-Qanṭara, 33 (2): 325-342, February 2013. ISSN 1988-2955, 0211-3589.

    Footnotes

    1. For more details, see Chapter 1 in Romanov (2013). []
    2. While most humanists remain skeptical in regard to working with big data, the number of studies that show that close reading alone is not enough keeps on growing. They emphasize that case studies based on close reading do not allow for extrapolations; that humanists are prone to putting too much effort into studying objects that are unique and for this reason are least likely to represent larger trends. Most vivid examples can be found in the field of literary history, see, e.g., Moretti (2007), Moretti (2013) and Jockers (2013). []
    3. See, Moretti (2007), p. 4. []
    4. For example, Morris (2013) uses the size of the largest urban center as an indicator of the social development of a region to which it belongs. Bulliet (1979) uses onomastic data as the indicator of conversion. []
    5. For valuable examples of modeling “big data, ” see: Moretti (2007), Morris (2013); also see http://orbis.stanford.edu/ for the geographical model of the Roman world, developed by Walter Scheidel and Elijah Meeks. In the field of Islamic studies: Bulliet (1979). []
    6. Bulliet’s model of conversion is a great example of this. The very fact that this study is still criticized after more than three decades from its publication shows that a solid model cannot be discarded through a critique of where it fails, if otherwise it still remains plausible and coherent. For the most recent critique, see: Wasserstein (2013). []
    7. Both toponyms proper and toponymic nisbas linked with relevant toponyms. Toponymic data is crucial for our understanding of the social geography of the classical Islamic world. For my modeling of the geography of the Islamic world based on the data from Taʾrīkh al-islām see, Romanov (2013), p. 35-37, 41-42, 87-113. []
    8. Romanov (2013), p. 28-51. []
    9. For a detailed discussion, see Romanov (2013), p. 43-46. []
    10. See, Cohen (1970). []
    11. Petry (1981). For the “Glossary, ” see pp. 390-402. []
    12. Shatzmiller (1994); For extensive lists of names/occupations, see pp. 101-168, 410-424. []
    13. Some wazīrs rivaled their “employers” in influence. The most prominent examples are the Barmakid family that served the ʿAbbāsid caliphs and Niẓām al-mulk who served Mālikshāh, the Great Saljuq sulṭān. []
    14. There are 322 physicians in the ʿUyūn al-anbāʾ fī ṭabaqāt al-aṭibbāʾ of Ibn Abī Uṣaybiʿa (d. 668/1270 CE) and quite a few physicians are Jews and Christians, judging by their names. al-Dhahabī’s count of physicians is about 200 which can be considered a very thorough coverage, since Ibn Abī Uṣaybiʿa’s book is devoted exclusively to the physicians (and as it often happens, tends too overstretch the definition of the group), while al-Dhahabī’s book is a general history. []
    15. The first major peak of the nisba al-Makhzūmī is around 150/768 CE and geographically it peaks largely in the Central Arabian Cluster (65 al-Makhzūmīs). []
    16. Hodgson (1974), p. 64. []
    17. Unfortunately, at the moment my algorithms are not tuned well enough to trace all individuals who belonged to the military sector. The nisbaal-amīr should serve well as an indicator: it is the most frequent “descriptive name” within the military sector and it is the easiest to trace computationally. []
    18. Most prominently, women from their households, see, Humphreys (1994). []
    19. See, for example: TI, 28, 311-312; TI, 29, 68-76; TI, 37, 57-58; TI, 37, 185-186; TI, 38, 157-158; TI, 39, 370-387; TI, 41, 161-164; TI, 42, 407; TI, 44, 220; TI, 45, 119; TI, 45, 164; TI, 45, 311-313; TI, 45, 359; TI, 45, 402-406; TI, 46, 87-88; TI, 46, 289; TI, 46, 431-432; TI, 47, 165; TI, 47, 308; TI, 49, 192; TI, 50, 264; TI, 51, 196-197; TI, 51, 369-370; TI, 52, 368; TI, 52, 409-411. On the military patronage, see also: Humphreys (1989). []
    20. Somehow, the “Fulān al-dīn” names still have a strong steel aftertaste. The most common first components of the “Fulān al-dawla” pattern are Sayf al-dawla, “Sword of the Dynasty;” Nāṣir…, “Helper…;” Naṣr, “Victory;” Muʿizz, “Strengthener;” ʿIzz, “Strength;” ʿAḍud, “Support;” Tāj, “Crown;” Bahāʾ, “Splendor;” Ḥusām, “Cutting Edge.” The most first components of the “Fulān al-dīn” pattern are: Sayf al-dīn, “Sword of Religion;” ʿIzz…, “Strength…;” Jamāl, “Beauty;” Badr, “Full Moon;” Shams, “Sun;” Ṣalāḥ, “Goodness;” Ḥusām, “Cutting Edge, ” Quṭb, “Pole;” ʿAlam, “Banner.” []
    21. There is also a late peak that corresponds to the restoration of the independence of the ʿAbbāsid caliphate during the second half of the 6th/12th century, but it is short-lived. []
    22. I should remind the reader that only nisbas that are used to describe at least 10 individuals are considered in this analysis. []
    23. Largely following Shatzmiller’s classification, see: Shatzmiller (1994); these sectors often overlap. []
    24. Trades that involve dealing with any complex compounds: al-ʿAṭṭār, “druggist, perfumer;” al-Ṣaydalānī, “apothecary, druggist;” al-Ṣābūnī, “soap maker/seller” etc. []
    25. The decline does not appear as staggering as, for example, Cohen’s (Cohen (1970)) study argued. []
    26. Very similar to what Bulliet argued regarding toponymic nisbas: “For example Karkh, a popular quarter of Baghdad, appears in the nisba al-Karkhī when representation from Iraq is high. When the proportion is smaller, the name of the major city itself is a common nisba. In the example given, a later resident of Karkh would appear as al-Baghdadī. Finally, when the proportion is very low, the nisba will frequently be derived from the entire province, that is, al-Baghdadī becomes al-ʿIrāqī.” (Bulliet (1979), p. 12). []
    27. The most important studies are: Makdisi (1981), Berkey (1992), Chamberlain (1994). To a large extent Berkey’s and Chamberlain’s studies are responses to Makdisi’s “over-institutionalization.” []
    28. It seems that Gilbert is the only one to use this term in her study of the learned of Medieval Damascus: see, Gilbert (1980). However, in her study this term appears to blend into institutionalization and both become practically indistinguishable. Other scholars mention professionalization almost exclusively with the reference to Gilbert’s work. See, for example: Chamberlain (1994), p. 70; Ephrat (2000), p. 104, 179. []
    29. Melchert (1997). []
    30. The failure of the Mālikīs Melchert explains by their being too closely linked to the caliphal patronage and when the caliphs were eclipsed, so were the Mālikīs. See: Melchert (1997), p. 176. []
    31. The decline of the frequency of the word madrasa should not be interpreted as a decline of this institution, but rather as a change in the form of reference in general: most madrasas are referred to by their “al-Fulāniyya” names (see Figure 14). []
    32. Chamberlain (1994). []
    33. The most common references are the forms of abū, “father.” Since this word is also the essential part of kunya, an extremely common patronymic element of the Arab/Muslim name, only its forms with pronominal suffixes—such as abū-hu, “his father”—are considered. The same principle is applied to other ambiguous family terms. []

    0 0

    My dissertation—“Computational Analysis of Arabic Biographical Collections with Special Reference to Preaching in the Sunnī World (661-1300 CE)”—is now available online through the digital library @ the University of Michigan. Even with very extensive Appendices, several thousand graphs and maps still did not make it into the dissertation. Hopefully, if I can find enough time, I will make an online appendix with the visualizations of all generated data that consists mainly of chronological graphs of “descriptive names” and chronological maps that show how their geographies were changing over time (all based on “The History of Islam” of al-Dhahabī (d. 1348)).

    Abstract

    A project in the digital humanities, the dissertation explores methods of computational text analysis. Relying on text-mining techniques to extract meaningful data from unstructured text, the study offers an effective and flexible method for the analysis of Arabic biographical collections, the most valuable source for the social history of the pre-modern Islamic world. It uses the largest collection, “The History of Islam” of al-Dhahabī (d. 1348), as a case-study of applying the new method and shows how almost 30,000 biographies can be studied as a whole. A step toward finding a viable solution for studying the entire digital corpus of classical Islamic texts (400 mln. words), Chapter I offers a detailed explanation of “computational reading” that was built upon existing digital approaches from a variety of disciplines. Chapter II models big data extracted from the main source to further our understanding of the social geography of the Islamic world and its major social transformations, simultaneously providing an important background for the next chapter. Chapter III applies the devised method to the study of Islamic preaching from chronological, geographical and social perspectives that have been overlooked in the academic treatment of this subject. Largely an exploratory overview, it traces long-term changes in preaching practices as well as statuses of preachers within the Islamic elites. This chapter demonstrates how exactly computational reading can contribute to the studies of specific phenomena and practices. The final section overviews broad prospects of the further application of “computational reading” to a variety of genres of pre-modern Arabic literature. The dissertation heavily relies on the visual display of information in the form of graphs, charts, maps, and tables that are used in the main body and supplied in Appendices.


    0 0

    A Screenshot of al-Thurayyā. Click on the screenshot to open the Gazetteer in full screen.

    With Teams Pelagios and Pleiades—in alphabetical order: Elton Barker, Tom Elliot, Leif Isaksen, Rainer Simon—visiting Tufts University within the framework of the Perseids Named Entity Hackathon (organized and led by Bridget Almas) at the Perseus Project, we had a chance to test how their systems work with Arabic texts.1 Pelagios offers a convenient workflow for “geographical” reading of texts, which consists of two main steps: first, one tags places that occur in the text, then one “geo-resolves” tagged places into geographical locations that get displayed on an interactive geographical map (For more details, see Pelagios Website). The first step is smooth and easy and works nice for texts in any language as long as it is provided in Unicode. The second step depends on the availability of relevant gazetteers, to which Pelagios is or can be connected. Thus, Pelagios does a great job when it comes to the “geo-resolution” of toponyms included into Pleiades, which now has almost 35,000 places from the Ancient world. Since there is no gazetteer for the classical Islamic world, “geo-resolution” of classical Arabic sources is problematic at the moment. A gazetteer for the Islamic world is badly needed in general.

    As is the case with a creation of any database, creating a gazetteer is an extremely time-consuming task. The key seems to be in generating a snowball effect: creating enough database entries that would encourage a community of potentially interested individuals to start contributing to an already substantial databank by offering new data, references, corrections and additions. Pleiades has successfully used this model. Having incorporated content from such extensive editions as “Digital Atlas of Roman and Medieval Civilizations” (DARMC) and “Barrington Atlas of the Greek and Roman World” (BAGRW), Pleiades offered a significant foundation for potential users to contribute to. It seems only logical to follows in the footsteps of such a successful project as Pleiades, and to use their infrastructure for developing an Islamic gazetteer, which will feature in Pleiades as al-Thurayya: a Supplement for the Islamic World. (In this light, the name al-Thurayya, Arabic for Pleiades, seems quite appropriate; Tom Elliot, one of the managing editors of Pleiades, will be providing support for the integration of al-Thurayya into Pleiades.)

    In the case of the classical Islamic world, there are, unfortunately, very few publications that offer geographical data of magnitude that would be comparable to that of DARMC and BAGRW. In fact, there is only one edition that can provide a solid backbone of geographical data for the initial stage of the creation of an Islamic gazetteer: Georgette Cornu’s Atlas du monde arabo-islamique à l’époque classique: IXe-Xe siècles (Brill, 1985; maps by Olivier Chareire). Largely based on M.J. de Goeje’s Bibliotheca Geographorum Arabicorum, this Atlas represents early geographical and travelogue literature in Arabic and, to some extent, in Persian (9 geographical treatises from BGA, plus 18 other works).

    The Atlas consists of 20 maps, which cover the extent of the Islamic world in 9-10th centuries, and an extensive gazetteer that briefly characterizes every place, providing succinct verbal description of its geographical location, its place in the geographical hierarchy, and coded references to primary and secondary sources. Maps vary in scale, but, in general, they are very detailed, dense in places and provide trade routes.2

    A Screenshot of al-Thurayyā

    Geographical Coverage of Cornu’s Atlas

    Cornu’s Atlas represents most of early Islamic geographical sources in general, but none of them in particular—peculiarities of each geographer are preserved in the gazetteer, but not reflected on maps. Although somewhat a “Frankenstein” of the early Islamic geography, Cornu’s Atlas is an incredible piece of scholarly work that does offer the best starting point for studying Islamic geography as well as various topics in Islamic history with digital methods.

    Unlike DARMC and BARGW, Cornu’s Atlas was published only once3 in 1980s and has never made it into a digital form (at least to my knowledge). Nor does the gazetteer offer coordinates for places. So, creating a digital gazetteer is a bit of a methodological challenge. The most effective way is to “georeference” Cornu’s maps in a GIS program (for example, QGIS) and then to collect necessary geographical features from these georeferenced maps. “Georeferencing” can be described as a process of deforming the image of a map in such a way that its coordinate grid corresponds to the coordinates within a GIS software. In other words, if one georeferences specific points—for example, intersections of parallels of latitude and meridians of longitude—a GIS program will deform the image of a map in such a way that all geographical features—cities, villages, and trade routes—will correspond to their geographical locations. In most cases…

    A Screenshot of al-Thurayyā

    Georeferenced Cornu’s Atlas

    As a method, georeferencing is precise, but its results depend on the quality of original maps, and some particular factors often complicate things. Ideally, for georeferencing one needs to know the projection of the map—something which all Cornu’s maps lack (as is the case with most historical academic maps). Fortunately, Cornu’s maps have rather detailed coordinate grids, in most cases covering every or every other degree of latitude and longitude.4 By georeferencing coordinate grids one can still produce quite reliable overlays. An example below shows a section of one of Cornu’s maps overlaid on top of Google physical map: medieval al-Mawsil corresponds to modern Mosul, and medieval Tall A‘far—to modern Tal Afar, while some other features—in this case, the Tigris river—are slightly off.

    A Screenshot of al-Thurayyā

    A section of a georeferenced Cornu’s map overlaid on top of Google physical map

    Converted into a digital dataset, contents of Cornu’s Atlas will become the backbone of geographical data that can be improved, expanded, corrected. An example of a searchable digital map based on one the maps from Cornu’s Atlas can be found below: the map on the left shows dynamic clustering of toponyms; the map in the middle shows places; the map on the right shows trade routes (click on the image to view dynamic searchable map; layers can be switched on/off in the upper right corner of the map). NB: “Place Filter” supports search using Arabic and simplified transliteration (omitting hamzas and ‘ayns, and disregarding macrons of long vowels and dots of emphatic consonants). Make sure to switch on the Places layer. There may be typos in transliteration (and Arabic, since it is Arabic names are automatically converted from transliteration); I will appreciate if you email corrections/suggestions.

    A Screenshot of al-Thurayyā

    View in full screen
    Toponymic data from the map of Greater Syria (Province du Šām).
    Special thanks to Rainer Simon @ Pelagios and Adam Tavares @ Perseus for their help with building this interactive map.

    Footnotes

    1. For more details, see Marie-Claire Beaulieu’s post on Perseids Website. []
    2. It is not clear what the lines of the trade routes are based on. Unlike maps/cartograms of trade and postal routes created by Aloys Sprenger (Die Post- und Reiserouten des Orients, Leipzig 1864) and Guy Le Strange (The Lands of the Eastern Caliphate, Cambridge 1905), who connected locations with straight lines, Cornu’s maps offer realistic routes. []
    3. The gazetteer was published in three gradually updated versions. []
    4. In this regard, maps from Brill’s An Historical Atlas of Islam (1981, 2002) are not suitable for this task, since they lack information on projection, and do not provide values for the coordinate grid, which significantly affects the precision of georeferencing. See this example: A georeferenced map of Iran in the 4th-5th / 10th-11th Centuries. NB: Routes are straight lines between two points; georeferenced in QGIS. []

    0 0

    Back to al-Dhahabī’s Ta’rīkh al-islām. The present dynamic cartogram shows how the prominence of major urban centers was changing over time. The focus is again on “descriptive names” (nisba) and the “size” of each urban center on the cartogram reflects the number of individuals with “descriptive names” that refer to that urban center. A “prominent center” in the current dataset is a place with which at least 10 individuals from Ta’rīkh al-islām are associated1 (the overall number of individuals in the current dataset is slightly over 29,000 for the period of 661–1300 CE). Each frame features the names of the top 15 urban centers (the largest among them gradually change their hue from green to red).

    The cradle of Islam, central Arabia is the most prominent region in the early period. Major urban centers of this region are Mecca/Makka (269) and Medina/al-Madīna (691), but their cultural prominence soon shifts to the main garrison cities of lower Iraq: Kufa/al-Kūfa and Basra/al-Baṣra. The decline of central Arabia starts around 100/719 CE and by 250/865 CE this region is diminished to a marginal province. (The south Arabian cluster displays a similar trend.)

    Major Early Bloomers: Medina/al-Madīna (691); Kufa/al-Kūfa (1,432), and Basra/al-Baṣra (1,595).

    Iraq very quickly becomes the central region and maintains this status for the most of the period covered in Ta’rīkh al-islām. During the early period its prominent urban centers are Basra/ al-Baṣra (1,595) and Kufa/ al-Kūfa (1,432), but the prominence of these garrison towns is soon dwarfed by Baghdad, the new capital city, and they practically disappear from the social map of the Islamic world by around 300/913 CE. Baghdad remains the dominant urban center not only for Iraq, but for the entire Islamic world until the beginning of the 13th century CE. Other major urban centers of this region are Wāsiṭ (401) and al-Anbār (83).

    The rapid growth of Iraq comes to a halt around 200/816 CE—at this period the Caliphate is being torn apart by the civil war between al-Amīn and al-Maʾmūn, the sons of great Hārūn al-Rashīd (r. 786-809 CE), who decided to divide the Empire between them. The province falls into clearly visible decline. In the course of the 9th century the power slips from the ʿAbbāsid caliphs: first into the hands of the military commanders of their slave armies, then—the Būyids (932-1055 CE) and the Saljūqs (1038-1194 CE).

    480/1088 CE marks the beginning of a century-long recovery for Iraq—the ʿAbbāsid caliphs gradually manage to shake off the ‘‘protectorship” of the military (at this point, the Saljūq sulṭāns) and temporarily regain their independence. Caliphs, sulṭāns, and viziers (wazīr) vie for for influence with each other, seeking the support of religious scholars and relying on various mechanisms of promoting different legal schools—respectively, the Ḥanbalīs, the Ḥanafīs, and the Shāfiʿīs. The data from Ta’rīkh al-islām shows that it is during this period that these groups start growing quite noticeably.

    The number of Baghdādīs drops quite noticeably before the Mongol sack of the capital city in 656/1258 CE. Numbers of deaths reported for the 20-lunar-year periods after 600/1204 CE: 244 for 600-620 AH/1204-1224 CE CE; 256 for 621-640 AH/1225-1243 CE; 98 for 641-660 AH/1244-1262 CE CE; 27 for 661-680 AH/1263-1282 CE CE; 51 for 681-700 AH/1283-1301 CE.

    By the end of the period covered in Ta’rīkh al-islām, Iraqi élites drastically decrease in numbers, practically disappearing from the social map of the Islamic world. Although the Mongol invasion is often considered the main cause, the data from Ta’rīkh al-islām shows that the ranks of Iraqi élites start thinning well before the coming of the Mongols. Despite these vicissitudes, the number of notable men in Iraq remains quite significant over the most part of our period, and the prominence of Iraq is rivaled only by Iran, with all its clusters combined.

    Major “middle bloomers,” Iranian provinces gain prominence between 100/719 CE and 200/816 CE. The curve of northeastern Iran (Khurāsān) reaches its highest point quite quickly around 200/816 CE and remains there, fluctuating slightly, for over three centuries, and goes into a rapid decline after 520/1127 CE. It takes longer for northwestern Iran to reach its peak—around 350/962 CE—and then it slowly goes down. Unlike northeastern Iran, it is still visible on the maps of the Islamic world by the end of our period. The curve of southwestern Iran reaches its highest point around 280/894 CE, then goes into a temporary decline during the 4th/10th century, recovers by 400/1010 CE and begins to go down slowly, increasing its pace of decline around 520/1127 CE. The major urban centers are: Nishapur/Naysābūr (1,038), Merv/Marw [al-shāhijān] (385), Herat/Harāt (392), Balkh (171) and Ṭūs (136) in northeastern Iran (Khurāsān); Rey/al-Rayy (280), Hamadhān (254) and Qazwīn (118) in northwestern Iran; and Isfahan/Iṣbaḥan (1,124) and Shīrāz (100) in southwestern Iran.

    Major Middle Bloomers: Baghdād (3,086); Isfahan/Iṣbahān (1,100); Nishapur/Naysābūr (1,038); Cordova/Qurṭuba (634); Andalusia/al-Andalus (582).

    The curves of Iranian clusters correspond to what scholars of Islam often refer to as ‘‘Iranian intermezzo, ”2 a period of Iranian independent dynasties (roughly 750-1150 CE): the Ṭāhirids (821-873), Ṣaffārids (867-903) and Sāmānids (875-999) in the east and the Būyids (932-1055) in the north and west. All Iranian clusters practically come to naught by the end of the period covered in Ta’rīkh al-islām.

    The two-peaked curve of the last “middle bloomer,” al-Andalus, seems to correspond to the zenith of the Umayyad caliphate in Spain (756-1031 CE) around 380/991 CE, followed by its disintegration and the recovery under the Almoravids/al-Murābiṭūn (1056-1147 CE) and the Almohads/al-Muwaḥḥidūn (1130-1269 CE)—beginning around 470/1078 CE and peaking around 590/1195 CE; after that Andalusia is erased from the map of the Islamic world by the Christian Reconquista. The major Andalusian urban centers are Cordova/Qurṭuba (633), Seville/Ishbīliya (248), Valencia/Balansiyya (141) and Toledo/Ṭulayṭila (89).

    Major Late Bloomers: Damascus/Dimashq (1,573); Egypt/Miṣr (1,501); Alexandria/al-Iskandarīya (212).

    Regional clusters that can be characterized as “late bloomers” often have earlier peaks of prominence: around 100/719 CE for Syria, when the first great Islamic dynasty, the Umayyads (661-750 CE), rules from there; around 200/816 CE for the Jazīra and Jordan; and around 300/913 CE for Egypt—followed by equally noticeable decline until around 500/1107 CE. However, their main peaks of prominence fall on the end of the period, by which the “late bloomers” form what can be considered as one continuous crescent-shaped macro-region stretching from Egypt/Miṣr in the south, through Jordan/al-Urdunn, Syria/al-Shām, the Jazīra/Upper Mesopotamia, the northern part of Iraq, the very south of the Caucasian cluster in the north, and even touches northwestern Iran (Zanjān). The prominence of these regions rises noticeably after 500/1107 CE—right at the onset of the rule of dynasties that unify the region: the Zangids (1127-1222 CE), the Ayyūbids (1169-1250 CE), and the Mamlūks (1250-1517 CE). The major urban centers are: Mosul/al-Mawṣil (313) and Ḥarrān (224) in the Jazīra; Damascus/Dimashq (1,769), Homs/Ḥimṣ (268), Aleppo/Ḥalab (231) and Hamah/Ḥamā (103) in Syria; Jerusalem/al-Quds (315) in Jordan; and Alexandria/al-Iskandariyya (211) in Egypt/Miṣr.3 By the end of the main period covered in the Ta’rīkh al-islām, Syria becomes the new center of the Islamic world, with Egypt being next in the line.

    The Eastern Urban Crescent of the 7th/13th Century. A similar shift toward the Mediterranean shore happens with the western urban centers a century earlier (most clearly visible in Andalusia). This return to the Mediterranean can be interpreted as a sign of the formation of the new Mediterranean commonwealth with the Italian “Maritime Republics” (Genoa, Pisa, Venice, Almalfi and others) actively trading in the region.

    Cited Works

    al-Dhahabī (1990):
    al-Dhahabī. Ta’rīkh al-islām wa-wafayāt al-mashāhīr wa-al-a‘lām. Dār al-Kitāb al-‘Arabī, Bayrūt, 2 edition, 1990.

    Minorsky (1953):
    Vladimir Minorsky. Studies in Caucasian History. CUP Archive, January 1953.

    al-Samʿānī (1998):
    al-Sam‘ānī. al-Ansāb. 5 vols. Bayrūt: Dār al-fikr, 1998.

    Romanov (2013):
    Maxim G. Romanov. Computational Reading of Arabic Biographical collections with Special reference to Preaching (661-1300 CE). Ph.D., University of Michigan, Ann Arbor, MI, 2013.

    Current Dataset

    SourcesTa’rīkh al-islām of al-Dhahabī (d. 748/1347)
    Period:  41-700 AH / 661-1300 CE (Volumes 4-52)
    Biographies: ~29,000
    Unique Nisbas: ~700
    Total number of Nisbas: over 70,000

    Footnotes

    1. The nature of nisbas is not unproblematic and anyone who has worked with biographical collections is likely to object saying that, for example, not every individual identified as “al-Madanī” was actually a Medinan; besides there definitely are Medinans who are not identified as such with this specific toponymic nisba, not to mention that the “descriptive name” al-Madanī (and its variation al-Madīnī) may refer to urban centers other than Medina. (See, for example, al-Samʿānī (1998), 5:235–239.) While such objections are not invalid, at this point of our knowledge and understanding of the overabundant biographical data from Arabic sources we simply do not know to what extent the presence of false positives (i.e., Madanīs who have nothing to do with Medina) and the absence of false negatives (i.e., the Medinans who are not identified as Madanīs) actually affects the overall picture. Working with big data requires some clearly identified methodological assumptions regarding the types of data used in modeling. My computational analysis of data from the Ta’rīkh al-islām yields about 700 unique nisbas (with over 300 toponymic ones) that identify at least 10 different individuals, while the overall number of these nisbas runs into over 70,000 instances, considering that individuals are often described with more than one nisba. While 70,000 data points can hardly be called “big data” by any scientific standards, this dataset is too big to make exact identification of each and every nisba possible. Thus, under these circumstances, treating nisbas at their face values is simply the most logical way to begin large scale analysis of biographical data from Arabic sources; as our knowledge about the “behavior” of nisbas in biographical collections improves—and this can be achieved only through large-scale exploratory analysis—these methodological assumptions can and will be adjusted. For the detailed discussion of methodological assumptions see, Romanov (2013), 28–40.  []
    2. The term was introduced by Vladimir Minorsky (Minorsky (1953), 110-116). []
    3. Cairo/al-Qāhira is not yet identifiable through onomastic data; most individuals from Egypt have the nisba al-Miṣrī (1,501) that associate them with the entire province. Although this nisba may also refer to Cairo, at the moment it does not appear possible to differentiate efficiently. []

    0 0

    A Screenshot of al-Thurayyā. Click on the screenshot to open the Gazetteer in full screen.

    This is our first usable demo of al-Thurayyā Gazetteer. Currently it includes over 2,000 toponyms and almost as many route sections georeferenced from Georgette Cornu’s Atlas du monde arabo-islamique à l'époque classique: IXe-Xe siècles (Leiden: Brill, 1983). The gazetteer is searchable (upper left corner), although English equivalents are not yet included; in other words, look for Dimashq/دمشق, not Damascus.

    You can browse the Gazetteer by clicking on any toponym marker. The popup will show the toponym both in Arabic script and transliterated. We are using a slightly modified transliteration system that facilitates conversion between fully transliterated, transliterated, and Arabic forms of toponyms. It should be easily understandable. There may be typos, because of the nature of how the data has been generated, so please, let us know if something should be corrected. The popup also offers a selection of possible sources on a toponym in question. You can check Arabic Sources: currently, al-Samʿānī’s Kitāb al-ansāb and Yāqūt’s Muʿjam al-buldān. Currently, the Gazetteer will only check for exact matches, which means that in some cases there will not be any entry at all, while in other cases there may be more than one and they may refer to other places with the same name. Improving the precision of this lookup is on our to-do list. You can also check if there is information on a toponym in question in Brill's Encyclopaedia of Islam, Pleiades, and Wikipedia.

    Credits & Acknowledgments

    Many thanks to Adam Tavares (programmer @ Perseus Project, Tufts) and, particularly, Cameron Jackson (senior, double-majoring in Arabic and Computer Science, Tufts) for the technical development; to Vickie Sullivan (Chair, Classics Department), Gregory Crane and the entire Perseus team on the both sides of the Atlantic for support and inspiration.


    0 0
  • 02/06/15--16:00: BetaCode for Arabic
  • Arabic betaCode

    Although both Windows and Mac OS now support Arabic, it is still quite difficult to type and edit Arabic texts. It is particularly frustrating to edit and manipulate fully vocalized texts, since most fonts either render “short vowels” (ḥarakāt) invisible, or do not render them properly. Because of the “stacking,” i.e. “short vowels” being placed on top of letters and on top of each other, it becomes impossible to edit texts and one is often forced to go into delete-and-retype mode (and there is still no guarantee, because of visual issues, that all the letters and “short vowels” will actually be in the right order). betaCode can make it easy to type fully-vocalized Arabic texts on any machine through the use of simple character combinations and automatic rendering into various transliteration schemes and the Arabic script (scroll below for examples).

    betaCode is first converted into a one-to-one transliteration scheme, which combines conventions from various academic transliteration schemes. Such scheme is necessary, since none of the existing academic schemes (American/Library of Congress, British, French, German, etc.) allow representing Arabic text unambiguously for computational purposes. Arabic betaCode transliteration can be then converted into any transliteration convention. At the moment the following schemes are implemented:

    • Library of Congress Romanization of Arabic
    • Simplified transliteration (LOC without diacritics)
    • Arabic script (the rules of hamzaŧ orthography are implemented, but may require some additional testing)

    NB: The idea of betaCode is borrowed from the Classicists who developed a method of representing, using only ASCII characters, characters and formatting found in ancient Greek texts. The current betaCode is inspired by, and is therefore quite similar to, the arabTex scheme. Linguists working with Arabic are commonly using Buckwalter transliteration, which is very similar to the current betaCode, but less readable.

    betaCode and One-To-One Transliteration

    betacode translit Arabic letter
    _a ā alif
    b b bāʾ
    t t tāʾ
    _t thāʾ
    ^g, j ǧ jīm
    *h, .h ḥāʾ
    _h khāʾ
    d d dāl
    _d dhāl
    r r rā’
    z z zayn
    s s sīn
    ^s š shīn
    *s, .s ṣād
    *d, .d ḍād
    *t, .t ṭāʾ
    *z, .z ẓāʾ
    ` ʿ ‘ayn
    *g, .g ġ ghayn
    f f fāʾ
    *k, .k, q qāf
    k k kāf
    l l lām
    m m mīm
    n n nūn
    h h hā’
    w w wāw
    _u ū wāw
    y y yāʾ
    _i ī yāʾ

    Non-alphabetic letters

    betacode translit Arabic
    ' ʾ hamzaŧ
    /a á alif maqṣūraŧ
    :t ŧ tāʾ marbūṭaŧ

    Vowels

    betacode translit Arabic
    ~a ã dagger alif
    u u ḍammaŧ
    i i kasraŧ
    a a fatḥaŧ
    .n ȵ n of tanwīn
    .a å silent alif
    .w ů silent wāw
    ?u final ḍammaŧ *
    ?i final kasraŧ *
    ?a final fatḥaŧ *

    * “finals” are those final vowels that are usually dropped in transliteration and pronounciation (i.e., al-kitāb, instead of al-kitābủ, al-kitābỉ, al-kitābả), vs those that are not (huwa, hiyya, ḏãlika, tilka).

    Basic principles:

    Every Arabic letter is betaCoded with its one-letter equivalent, preceded (if necessary) with a technical character that is similar to a diacritical mark in the transliterated version. Thus, most common symbols are as follows:

    General

    • _ (underscore), if a letter can be transliterated with macron/breve below or above (ā, , , , ū, ī)
    • . (period), or * (asterisk), if a letter can be transliterated transliterated with dot below or above (, , , , , ġ, )
    • ^ (caret), if a letter can be transliterated with caron (ǧ, š)

    Specifics

    • attached prepositions/conjunctions and pronominal suffixes must be separated with “-” (mostly relevant for text alignment, treebanking, and general readability):
      • bi-Llah?i
      • fa-_dahaba
    • add “?” before “optional” final vowels that are usually dropped in transliteration and pronounciation (mostly relevant for transliteration):
      • bi-Llah?i, but not:
      • fa-_dahaba
    • tāʾ marbūṭaŧ: add “+” after tāʾ marbūṭaŧ, if the first word of iḏāfaŧ (mostly relevant for transliteration):
      • `_amma:t+u Ba.gd_ada, but:
      • al-`_amma:tu f_i Ba.gd_ada
    • transliterating tanwīn:
      • .n
        • ?u.n
        • ?i.n
        • ?a.n
    • silent wāw and alif:
      • .w (Amr?u.n.w, for عَمْرٌو)
      • .a (wa-fa`al_u.a, for وَفَعَلُوا)

    Running the converter

    • (Python 3.xx must be installed on the machine)
    • clone git repository
    • save texts that must be transliterated (i.e., the text is in English, but has some Arabic terms that must be transliterated) into ./to_translit/ (follow the format given in the example file).
    • save texts that must be fully transliterated or/and converted into Arabic script (i.e., the entire texts is in Arabic) into ./to_arabic/ (follow the format given in the example file).
    • run the script _generateBetaCode.py (in Mac terminal: python3 _generateBetaCode.py; on Windows: double-click on the script should work).
    • converted texts (in all available modes of conversion) will be appended to the file.
    • if you need to make any changes, edit your initial betaCode text and run the script again, converted results will be replaced with relevant updated versions.

    Examples

    betaCode Example

    NB: These are examples of converting betaCode to full transliteration and Arabic script. The very last paragraph showcases conversion of hamzaŧ in different positions.

    q_ala 'ab_u Mas`_ud?i.n :: 'an_a qad sami`tu h~a_d_a min ras_ul?i All~ah?i ( .sl`m )

    .hadda_ta-n_a `Amr?u.w bn?u R_afi`?i.n , .hadda_ta-n_a `Abd?u All~ah?i bn?u al-Mub_arak?i , `an Mu.hammad?i bn?i 'Is.h_aq?a , `an Mu.hammad?i bn?i ^Ga`far?i.n , `an `Ubayd?i All~ah?i bn?i `Abd?i All~ah?i bn?i `Umar?a , `an 'Ab_i-hi , `an?i al-Nabiyy?i ( .sl`m ) na.hwa-hu

    'a_hbara-n_a Qutayba:t?u q_ala , .hadda_ta-n_a Sufy_an?u , `an Ya.hy/a bn?i Sa`_id?i.n , `an 'Ab_i Bakr?i bn?i Mu.hammad?i.n , `an `Umar?a bn?i `Abd?i al-`Az_iz?i , `an 'Ab_i Bakr?i bn?i `Abd?i al-Ra.hm~an?i bn?i al-.H_ari_t?i bn?i Hi^s_am?i.n , `an 'Ab_i Hurayra:t?a mi_tla-hu

    Ta.hw_il?u al-hamza:t?i ( kalim_at?u.n mufrada:t?u.n )

    'amr?u.n 'uns?u.n 'ins?u.n '_im_an?u.n '_aya:t?u.n '_amana mas'ala:t?u.n sa'ala ra's?u.n qur'_an?u.n ta'_amara _di'b?u.n as'ila:t?u.n q_ari'i-hi su'l?u.n mas'_ul?u.n tak_afu'u-hu su'ila q_ari'i-hi _di'_ab?u.n ra'_is?u.n bu'isa ru'_uf?u.n ra'_uf?u.n su'_al?u.n mu'arri_h?u.n abn_a'a-hu abn_a'u-hu abn_a'i-hi ^say'?a.n _ha.t_i'a:t?u.n .daw'u-hu .d_u'u-hu .daw'a-hu .daw'i-hi mur_u'a:t?u.n 'abn_a'i-hi bar_i'u-hu s_u'ila f_il?u.n f_ann?u.n f_unn?u.n s_a'ala fu'_ad?u.n ^surak_a'u-hu ri'_asa:t?u.n tahni'a:t?u.n daf_a'a:t?u.n .taff_a'a:t?u.n ta'r_i_h?u.n fa'r?u.n ^say'?u.n ^say'?i.n ^say'?a.n .daw'?u.n .daw'?i.n .daw'?a.n juz'?u.n juz'?i.n juz'?a.n mabda'?u.n mabda'?i.n mabda'?a.n naba'a q_ari'?u.n tak_afu'?u.n tak_afu'?i.n tak_afu'?a.n abn_a'u abn_a'i abn_a'a jar_i'?u.n maqr_u'?u.n .daw'?u.n ^say'?u.n juz'?u.n `ulam_a'u al-`ulam_a'i al-`ulam_a'a `Amr?u.n.w wa-fa`al_u.a

    betaCode converted into one-to-one translit

    ḳāla ʾabū Masʿūdỉȵ :: ʾanā ḳad samiʿtu hãḏā min rasūlỉ Allãhỉ ( ṣlʿm )

    ḥaddaṯa-nā ʿAmrủů bnủ Rāfiʿỉȵ , ḥaddaṯa-nā ʿAbdủ Allãhỉ bnủ al-Mubārakỉ , ʿan Muḥammadỉ bnỉ ʾIsḥāḳả , ʿan Muḥammadỉ bnỉ Ǧaʿfarỉȵ , ʿan ʿUbaydỉ Allãhỉ bnỉ ʿAbdỉ Allãhỉ bnỉ ʿUmarả , ʿan ʾAbī-hi , ʿanỉ al-Nabiyyỉ ( ṣlʿm ) naḥwa-hu

    ʾaḫbara-nā Ḳutaybaŧủ ḳāla , ḥaddaṯa-nā Sufyānủ , ʿan Yaḥyá bnỉ Saʿīdỉȵ , ʿan ʾAbī Bakrỉ bnỉ Muḥammadỉȵ , ʿan ʿUmarả bnỉ ʿAbdỉ al-ʿAzīzỉ , ʿan ʾAbī Bakrỉ bnỉ ʿAbdỉ al-Raḥmãnỉ bnỉ al-Ḥāriṯỉ bnỉ Hišāmỉȵ , ʿan ʾAbī Hurayraŧả miṯla-hu

    Taḥwīlủ al-hamzaŧỉ ( kalimātủȵ mufradaŧủȵ )

    ʾamrủȵ ʾunsủȵ ʾinsủȵ ʾīmānủȵ ʾāyaŧủȵ ʾāmana masʾalaŧủȵ saʾala raʾsủȵ ḳurʾānủȵ taʾāmara ḏiʾbủȵ asʾilaŧủȵ ḳāriʾi-hi suʾlủȵ masʾūlủȵ takāfuʾu-hu suʾila ḳāriʾi-hi ḏiʾābủȵ raʾīsủȵ buʾisa ruʾūfủȵ raʾūfủȵ suʾālủȵ muʾarriḫủȵ abnāʾa-hu abnāʾu-hu abnāʾi-hi šayʾảȵ ḫaṭīʾaŧủȵ ḍawʾu-hu ḍūʾu-hu ḍawʾa-hu ḍawʾi-hi murūʾaŧủȵ ʾabnāʾi-hi barīʾu-hu sūʾila fīlủȵ fānnủȵ fūnnủȵ sāʾala fuʾādủȵ šurakāʾu-hu riʾāsaŧủȵ tahniʾaŧủȵ dafāʾaŧủȵ ṭaffāʾaŧủȵ taʾrīḫủȵ faʾrủȵ šayʾủȵ šayʾỉȵ šayʾảȵ ḍawʾủȵ ḍawʾỉȵ ḍawʾảȵ ǧuzʾủȵ ǧuzʾỉȵ ǧuzʾảȵ mabdaʾủȵ mabdaʾỉȵ mabdaʾảȵ nabaʾa ḳāriʾủȵ takāfuʾủȵ takāfuʾỉȵ takāfuʾảȵ abnāʾu abnāʾi abnāʾa ǧarīʾủȵ maḳrūʾủȵ ḍawʾủȵ šayʾủȵ ǧuzʾủȵ ʿulamāʾu al-ʿulamāʾi al-ʿulamāʾa ʿAmrủȵů wa-faʿalūå

    betaCode converted into Arabic script

    قَالَ أَبُو مَسْعُودٍ :: أَنَا قَدْ سَمِعْتُ هٰذَا مِنْ رَسُولِ الـلّٰـهِ ( صْلْعْمْ )

    حَدَّثَنَا عَمْرُو بْنُ رَافِعٍ ، حَدَّثَنَا عَبْدُ الـلّٰـهِ بْنُ الْمُبَارَكِ ، عَنْ مُحَمَّدِ بْنِ إِسْحَاقَ ، عَنْ مُحَمَّدِ بْنِ جَعْفَرٍ ، عَنْ عُبَيْدِ الـلّٰـهِ بْنِ عَبْدِ الـلّٰـهِ بْنِ عُمَرَ ، عَنْ أَبِيهِ ، عَنِ النَّبِيِّ ( صْلْعْمْ ) نَحْوَهُ

    أَخْبَرَنَا قُتَيْبَةُ قَالَ ، حَدَّثَنَا سُفْيَانُ ، عَنْ يَحْيٰى بْنِ سَعِيدٍ ، عَنْ أَبِي بَكْرِ بْنِ مُحَمَّدٍ ، عَنْ عُمَرَ بْنِ عَبْدِ الْعَزِيزِ ، عَنْ أَبِي بَكْرِ بْنِ عَبْدِ الرَّحْمٰنِ بْنِ الْحَارِثِ بْنِ هِشَامٍ ، عَنْ أَبِي هُرَيْرَةَ مِثْلَهُ

    تَحْوِيلُ الْهَمْزَةِ ( كَلِمَاتٌ مُفْرَدَةٌ )

    أَمْرٌ أُنْسٌ إِنْسٌ إِيمَانٌ آيَةٌ آمَنَ مَسْأَلَةٌ سَأَلَ رَأْسٌ قُرْآنٌ تَآمَرَ ذِئْبٌ أَسْئِلَةٌ قَارِئِهِ سُؤْلٌ مَسْؤُولٌ تَكَافُؤُهُ سُئِلَ قَارِئِهِ ذِئَابٌ رَئِيسٌ بُئِسَ رُؤُوفٌ رَؤُوفٌ سُؤَالٌ مُؤَرِّخٌ أَبْنَاءَهُ أَبْناؤُهُ أَبْنائِهِ شَيْئًا خَطِيئَةٌ ضَوْءُهُ ضُوؤُهُ ضَوْءَهُ ضَوْئِهِ مُرُوءَةٌ أَبْنائِهِ بَرِيؤُهُ سُوئِلَ فِيلٌ فَانٌّ فُونٌّ سَاءَلَ فُؤَادٌ شُرَكاؤُهُ رِئَاسَةٌ تَهْنِئَةٌ دَفَاءَةٌ طَفّاءَةٌ تَأْرِيخٌ فَأْرٌ شَيْءٌ شَيْءٍ شَيْئًا ضَوْءٌ ضَوْءٍ ضَوْءًا جُزْءٌ جُزْءٍ جُزْءًا مَبْدَأٌ مَبْدَأٍ مَبْدَأً نَبَأَ قَارِئٌ تَكَافُؤٌ تَكَافُؤٍ تَكَافُؤًا أَبْناءُ أَبْناءِ أَبْناءَ جَريءٌ مَقْروءٌ ضَوْءٌ شَيْءٌ جُزْءٌ عُلَماءُ الْعُلَماءِ الْعُلَماءَ عَمْرٌو وَفَعَلُوا

    betaCode into Translit

    betaCode in English text

    NB: This is an example of the English text with terms, names and toponyms given in betaCode and automatically converted into different transliteration flavors (exerpts are from Brill’s Encyclopaedia of Islam).

    Dima^s.k, Dima^s.k al-^S_am or simply al-^S_am , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Ba.gd_ad and F_as, at an altitude of nearly 700 metres, on the edge of the desert at the foot of ^Gabal .K_asiy_un.

    al-_Dahab_i, ^Sams al-D_in Ab_u `Abd All~ah Mu.hammad b. `U_tm_an b. .K_aym_a.z b. `Abd All~ah al-Turkum_an_i al-F_ari.k_i al-Dima^s.k_i al-^S_afi`_i, an Arab historian and theologian, was born at Damascus or at Mayy_afari.k_in on 1 or 3 Rab_i` II (according to al-Kutub_i, in Rab_i` I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subk_i and al-Suy_u.t_i, in the night of Sunday-Monday on 3 _D_u al-.Ka`da:t 748/4 February 1348, or, according to A.hmad b. `Iy_as, in 753/1352-3. He was buried at the B_ab al-.Sa.g_ir.

    betaCode converted into one-to-one translit

    Dimašḳ, Dimašḳ al-Šām or simply al-Šām , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Baġdād and Fās, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Ǧabal Ḳāsiyūn.

    al-Ḏahabī, Šams al-Dīn Abū ʿAbd Allãh Muḥammad b. ʿUṯmān b. Ḳāymāẓ b. ʿAbd Allãh al-Turkumānī al-Fāriḳī al-Dimašḳī al-Šāfiʿī, an Arab historian and theologian, was born at Damascus or at Mayyāfariḳīn on 1 or 3 Rabīʿ II (according to al-Kutubī, in Rabīʿ I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subkī and al-Suyūṭī, in the night of Sunday-Monday on 3 Ḏū al-Ḳaʿdaŧ 748/4 February 1348, or, according to Aḥmad b. ʿIyās, in 753/1352-3. He was buried at the Bāb al-Ṣaġīr.

    betaCode converted into the Library of Congress scheme

    Dimashq, Dimashq al-Shām or simply al-Shām , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Baghdād and Fās, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Jabal Qāsiyūn.

    al-Dhahabī, Shams al-Dīn Abū ʿAbd Allāh Muḥammad b. ʿUthmān b. Qāymāẓ b. ʿAbd Allāh al-Turkumānī al-Fāriqī al-Dimashqī al-Shāfiʿī, an Arab historian and theologian, was born at Damascus or at Mayyāfariqīn on 1 or 3 Rabīʿ II (according to al-Kutubī, in Rabīʿ I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subkī and al-Suyūṭī, in the night of Sunday-Monday on 3 Dhū al-Qaʿda 748/4 February 1348, or, according to Aḥmad b. ʿIyās, in 753/1352-3. He was buried at the Bāb al-Ṣaghīr.

    betaCode converted into a searcheable string (diacritics removed)

    Dimashq, Dimashq al-Sham or simply al-Sham , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Baghdad and Fas, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Jabal Qasiyun.

    al-Dhahabi, Shams al-Din Abu Abd Allah Muhammad b. Uthman b. Qaymaz b. Abd Allah al-Turkumani al-Fariqi al-Dimashqi al-Shafii, an Arab historian and theologian, was born at Damascus or at Mayyafariqin on 1 or 3 Rabi II (according to al-Kutubi, in Rabi I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subki and al-Suyuti, in the night of Sunday-Monday on 3 Dhu al-Qada 748/4 February 1348, or, according to Ahmad b. Iyas, in 753/1352-3. He was buried at the Bab al-Saghir.


    0 0
  • 03/11/15--17:00: Qawl 1.0
  • Qawl 1.0

    Qawl is a free suite of tools for searchers, teachers and students in the fields of Arabic studies. The program offers the following features:

    • a large library of Arabic texts (more than 1900)
    • a research algorithm which allows one to search any word matching a query
    • automatic identification of parallel passages (sources, quotations, etc.) of any pasted text by comparing it with the entire library
    • handy help for translation and analysis of Arabic words and sentences through various webtools (Aratools, Google Translate, Bing Translator)

    Developers: Sébastien Moureau, Chargé de recherches au F.R.S.-FNRS, Université catholique de Louvain (Belgium)

    URL: www.uclouvain.be/qawl/


    0 0
  • 03/13/15--17:00: Arabic Almanac
  • Arabic Almanac

    Perhaps the most valuable project out there, the Arabic Almanac is a collections of scanned dictionaries that are searcheable by roots. The Almanac uses the Mawrid Reader, an extendable HTML/JavaScript app for both desktop and mobile use for displaying and searching books. The Almanac can be downloaded to your hard disk or smart phone’s SD card, for fast offline usage.

    The Mawrid Reader makes the Arabic Almanac infinitely extendable and over the past few years the number of included dictionaries was significantly increased. As of March 2012, the project includes 31 dictionaries: Arabic-English (10, including such indispensable titles as Hans Wehr, Lane, Steingass, Hava, etc.), Arabic-Urdu (7), Arabic-Arabic (9), Arabic-Indonesian/Malaysian (4), and Arabic-French (Kazimirski).

    Detailed description of the project

    Developers: ejtaal.net

    URL: ejtaal.net/aa/


    0 0
  • 03/14/15--17:00: al-Maktabaŧ al-Waqfiyyaŧ
  • al-Maktabaŧ al-Waqfiyyaŧ

    The largest collection of scanned books in Arabic (scanned images comiled into PDFs) with over 7,200 titles (~12,000 volumes), including editions of most major classical Islamic sources. The website is regularly updated and is searcheable by titles, author names and some other parameters; it can be browsed by categories.

    Most texts are stored on archive.org and occasionally linked to Shamela. The collection also includes books in languages other than Arabic.

    Developers: waqfeya.com

    URL: waqfeya.com


    0 0

    Access to Mideast and Islamic Resources (AMIR)

    AMIR is a blog (with its own ISSN 2160-3049) maintained by a few scholars and librarians who regularly post annotated thematic lists of online resources relevant to the study of Middle East.

    Some particularly relevant blogposts:

    Contributors: Birte Kristiansen, Universiteit Leiden, Library, the Netherlands; Sean Swanick, McGill University Library, Canada; Peter Magierski, Columbia University Libraries, USA; Andreas Neumann; Charles Jones,The Pennsylvania State University Library, USA.

    URL: amirmideast.blogspot.com


    0 0

    A DH Exercise: Mapping the Greco-Roman World

    “Envy is not a very good thing. Yet envy is precisely what an early Islamicist feels when he reads Roger Bagnall and Bruce Frier’s The Demography of Roman Egypt.” 1 These words stuck in my head since the very moment I read them and over the past two years of working among and with the classicists my classics envy has been growing—on top of 300 original census declarations that were at the disposal of of the above mentioned scholars, there are way too many things to envy, especially when it comes to all things digital.

    The Pleiades Gazetteer is a particularly interesting case: with almost 35,000 places, it offers several well-populated categories of geographical objects. The categories include settlements, forts, temples, villas, stations, [amphi]theaters, churches, bridges, baths, cemetaries, plazas, archs. What makes it even more interesting is that most of these objects have chronological markers, i.e. they belong to one or more of the following periods: archaic (750–550BC), classical (550–330BC), hellenistic-republican (330–30BC), roman (30BC–300CE), late-antique (300–640CE).

    This data offers a opportunity for an interesting digital exersize with historical data. I assigned it to my students as a part of introduction to R (within my “Introduction to Text Mining for the Students of Humanities”, Tufts University, Spring 2015). The task was to explore the Pleiades data set, find out what is what and what can be done with it. The goal was to discover that 1) geographical objects are categorized, and that 2) they also have chronological markers, which can be used 3) to maps the geography of the Greco-Roman world over time.

    The map of forts turned out to be particularly interesting.

    Below is the code and some of the resulting visualizations.

    # Rlibrary(ggplot2)library(maps)library(mapdata)library(rgeos)library(maptools)library(mapproj)library(PBSmapping)library(data.table)
    
    xlim=c(-12,55); ylim=c(20,60)
    
    worldmap=map_data("world")
    setnames(worldmap,c("X","Y","PID","POS","region","subregion"))
    worldmap=clipPolys(worldmap,xlim=xlim,ylim=ylim,keepExtra=TRUE)
    
    dataFolder=""# ideally, full path to the folder
    csvName=paste0(dataFolder,"pleiades-locations-20150316.csv")
    locsRaw=read.csv(csvName,stringsAsFactors=F,header=T,sep=',')# url: http://atlantides.org/downloads/pleiades/dumps/# ---: download the latest csv, unzip 
    
    periods=rbind(c("archaic","750-550BC"),c("classical","550-330BC"),c("hellenistic-republican","330-30BC"),c("roman","30BC-300CE"),c("late-antique","300-640CE"))
    
    features=rbind(c("","locations"),c("settlement","settlements"),c("fort","forts"),c("temple","temples"),c("villa","villas"),c("station","stations"),c("theatre","theatres"),c("amphitheatre","amphitheatres"),c("church","churches"),c("bridge","bridges"),c("bath","baths"),c("cemetery","cemeteries"),c("plaza","plazas"),c("arch","archs"))
    
    land="grey"; water="grey80"; bgColor="grey80"
    locPleiades=geom_point(data=locsRaw,color="grey70",alpha=.75,size=1,aes(y=reprLat,x=reprLong))for(i in1:nrow(features)){
      locs=locsRaw[with(locsRaw,grepl(features[i,1],featureTypes)),]for(ii in1:nrow(periods)){
        locPer=locs[with(locs,grepl(periods[ii,1],timePeriodsKeys)),]
        locPer=geom_point(data=locPer,color="red",alpha=.75,size=1,aes(y=reprLat,x=reprLong))
        
        dataLabel="Data: Pleiades Project"
        fName=paste0(dataFolder,"Pleiades_",features[i,2],sprintf("%02d",ii),".png")
        header=paste0(features[i,2]," in the ",periods[ii,1]," period (",periods[ii,2],")")
        
        p=ggplot()+
          coord_map(xlim=xlim,ylim=ylim)+
          geom_polygon(data=worldmap,aes(X,Y,group=PID),size=0.1,colour=land,fill=water,alpha=1)+
          annotate("text",x=-11,y=21,hjust=0,label=dataLabel,size=3,color="grey40")+
          annotate("text",x=54,y=59,hjust=1,label=header,size=5,color="grey40")+ 
          locPleiades+ locPer+ labs(y="",x="")+theme_grey()
        
        ggsave(file=fName,plot=p,dpi=600,width=7,height=6)}}

    Using Image Magick to animate maps

    The fastest and easiest way to animate the results is to use ImageMagick, a free command-line utility. The following command will take all .png files whose names begin with Pleiades_Settle and convert them into an animated GIF file Pleiades_Settlements.gif, which will play continuously (-loop 0), with each frame downsized (-resize 1200x900) and paused for .75 of a second (-delay 75).

    convert -resize 1200x900 -delay 75 -loop 0 Pleiades_Settle*.png Pleiades_Settlements.gif
    

    Chronological Cartograms

    All Locations

    Settlements

    Forts

    All categories

    Amphitheaters, arches, baths, bridges, cemeteries, churches, forts, locations, plazas, settlements, stations, temples, theaters, villas.

    Footnotes


    1. al-Qādī, Wadād. “Population Census and Land Surveys under the Umayyads (41-132/661-750).” Der Islam 83, no. 2 (2006), p. 341 


    0 0
    0 0
  • 11/07/15--16:00: Introducing mARkdown
  • TEI XML has long become the standard for tagging humanistic texts for research purposes. It is the standard in most digital libraries (including the Perseus Digital Library). Having texts in a TEI XML format that conforms to the standards of a long-standing library allows one to take advantage of libraries’ infrastructure and analytical tools that have been developed since the appearance of TEI XML. Converting texts into XML, however, is a rather long and complicated process.

    Texts in Arabic make things even more complicated. Right-to-left (RTL) and left-to-right (LTR) text in one file is one the major challenges. Since the cursor changes the direction of its movement when crossing the boundary between RTL and LTR text, it is difficult to place the cursor properly, and one often ends up changing a wrong part of the text. The direction of paired characters is visually confusing, and it is often next to impossible to say whether a given angle bracket—perhaps the most important XML character—is an opening character or a closing one. Moreover, the shapes of Arabic letters in a text file are dynamically changing as one types or edits Arabic text, and many text editors do not handle this properly (particularly on Mac). In addition to these technical challenges, there are too many Arabic texts to convert—and most of them are multivolume titles—and too few people who have both training and willingness to do that.

    In the beginning of my digital research I have considered TEI XML as a working format, but I had to give up on this option, since converting a 50-volume book (~3,4 million words) would have taken forever. After reviewing existing approaches, I came up with a rather simple tagging system that allowed me to create a structured, machine-readable text, without sacrificing years of my life. In many ways, this system was inspired by markdown—“a text-to-HTML conversion tool ... that allows [one] to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).”

    The main goal of mARkdown is to provide a simple system for tagging structural elements in Arabic texts that would facilitate algorithmic analysis in the same way as more complex TEI XML does. In principle, mARkdown does not require any special editor, but my current workflow relies on EditPad Pro, which supports right-to-left languages, Unicode, and large files. However, it is the support of custom highlighting and navigation schemes that makes this text editor particularly convenient for mARkdown.

    Since I have been using my mARkdown for my own research purposes, it has not yet been developed into an easily reusable system. This is my first attempt to provide a detailed description and explain how it can be used. I expect that mARkdown will undergo some minor changes in the upcoming months. The most recent description can be accessed from the main menu above.

    mARkdown in EditPad Pro activated with the “magic value” in test_textFile

    0 0

    A Screenshot of al-Thurayyā. Click on the screenshot to open the Gazetteer in full screen.

    With Teams Pelagios and Pleiades—in alphabetical order: Elton Barker, Tom Elliot, Leif Isaksen, Rainer Simon—visiting Tufts University within the framework of the Perseids Named Entity Hackathon (organized and led by Bridget Almas) at the Perseus Project, we had a chance to test how their systems work with Arabic texts.1 Pelagios offers a convenient workflow for “geographical” reading of texts, which consists of two main steps: first, one tags places that occur in the text, then one “geo-resolves” tagged places into geographical locations that get displayed on an interactive geographical map (For more details, see Pelagios Website). The first step is smooth and easy and works nice for texts in any language as long as it is provided in Unicode. The second step depends on the availability of relevant gazetteers, to which Pelagios is or can be connected. Thus, Pelagios does a great job when it comes to the “geo-resolution” of toponyms included into Pleiades, which now has almost 35,000 places from the Ancient world. Since there is no gazetteer for the classical Islamic world, “geo-resolution” of classical Arabic sources is problematic at the moment. A gazetteer for the Islamic world is badly needed in general.

    As is the case with a creation of any database, creating a gazetteer is an extremely time-consuming task. The key seems to be in generating a snowball effect: creating enough database entries that would encourage a community of potentially interested individuals to start contributing to an already substantial databank by offering new data, references, corrections and additions. Pleiades has successfully used this model. Having incorporated content from such extensive editions as “Digital Atlas of Roman and Medieval Civilizations” (DARMC</span>) and “Barrington Atlas of the Greek and Roman World” (BAGRW), Pleiades offered a significant foundation for potential users to contribute to. It seems only logical to follows in the footsteps of such a successful project as Pleiades, and to use their infrastructure for developing an Islamic gazetteer, which will feature in Pleiades as al-Thurayya: a Supplement for the Islamic World. (In this light, the name al-Thurayya, Arabic for Pleiades, seems quite appropriate; Tom Elliot, one of the managing editors of Pleiades, will be providing support for the integration of al-Thurayya into Pleiades.)

    In the case of the classical Islamic world, there are, unfortunately, very few publications that offer geographical data of magnitude that would be comparable to that of DARMC and BAGRW. In fact, there is only one edition that can provide a solid backbone of geographical data for the initial stage of the creation of an Islamic gazetteer: Georgette Cornu’s Atlas du monde arabo-islamique à l’époque classique: IXe-Xe siècles (Brill, 1985; maps by Olivier Chareire). Largely based on M.J. de Goeje’s Bibliotheca Geographorum Arabicorum, this Atlas represents early geographical and travelogue literature in Arabic and, to some extent, in Persian (9 geographical treatises from BGA, plus 18 other works).

    The Atlas consists of 20 maps, which cover the extent of the Islamic world in 9-10th centuries, and an extensive gazetteer that briefly characterizes every place, providing succinct verbal description of its geographical location, its place in the geographical hierarchy, and coded references to primary and secondary sources. Maps vary in scale, but, in general, they are very detailed, dense in places and provide trade routes.2

    A Screenshot of al-Thurayyā

    *Geographical Coverage of Cornu’s Atlas*

    Cornu’s Atlas represents most of early Islamic geographical sources in general, but none of them in particular—peculiarities of each geographer are preserved in the gazetteer, but not reflected on maps. Although somewhat a “Frankenstein” of the early Islamic geography, Cornu’s Atlas is an incredible piece of scholarly work that does offer the best starting point for studying Islamic geography as well as various topics in Islamic history with digital methods.

    Unlike DARMC and BARGW, Cornu’s Atlas was published only once3 in 1980s and has never made it into a digital form (at least to my knowledge). Nor does the gazetteer offer coordinates for places. So, creating a digital gazetteer is a bit of a methodological challenge. The most effective way is to “georeference” Cornu’s maps in a GIS program (for example, QGIS) and then to collect necessary geographical features from these georeferenced maps. “Georeferencing” can be described as a process of deforming the image of a map in such a way that its coordinate grid corresponds to the coordinates within a GIS software. In other words, if one georeferences specific points—for example, intersections of parallels of latitude and meridians of longitude—a GIS program will deform the image of a map in such a way that all geographical features—cities, villages, and trade routes—will correspond to their geographical locations. In most cases…

    A Screenshot of al-Thurayyā

    *Georeferenced Cornu’s Atlas*

    As a method, georeferencing is precise, but its results depend on the quality of original maps, and some particular factors often complicate things. Ideally, for georeferencing one needs to know the projection of the map—something which all Cornu’s maps lack (as is the case with most historical academic maps). Fortunately, Cornu’s maps have rather detailed coordinate grids, in most cases covering every or every other degree of latitude and longitude.4 By georeferencing coordinate grids one can still produce quite reliable overlays. An example below shows a section of one of Cornu’s maps overlaid on top of Google physical map: medieval al-Mawsil corresponds to modern Mosul, and medieval Tall A‘far—to modern Tal Afar, while some other features—in this case, the Tigris river—are slightly off.

    A Screenshot of al-Thurayyā

    *A section of a georeferenced Cornu’s map overlaid on top of Google physical map*

    Converted into a digital dataset, contents of Cornu’s Atlas will become the backbone of geographical data that can be improved, expanded, corrected. An example of a searchable digital map based on one the maps from Cornu’s Atlas can be found below: the map on the left shows dynamic clustering of toponyms; the map in the middle shows places; the map on the right shows trade routes (click on the image to view dynamic searchable map; layers can be switched on/off in the upper right corner of the map). NB: “Place Filter” supports search using Arabic and simplified transliteration (omitting hamzas and ‘ayns, and disregarding macrons of long vowels and dots of emphatic consonants). Make sure to switch on the Places layer. There may be typos in transliteration (and Arabic, since it is Arabic names are automatically converted from transliteration); I will appreciate if you email corrections/suggestions.

    A Screenshot of al-Thurayyā

    *[View in full screen](http://maximromanov.github.io/projects/althurayya_01/)*
    *Toponymic data from the map of Greater Syria (Province du Šām).
    Special thanks to Rainer Simon @ Pelagios and Adam Tavares @ Perseus for their help with building this interactive map.*

    Footnotes

    1. For more details, see Marie-Claire Beaulieu’s post on Perseids Website. []
    2. It is not clear what the lines of the trade routes are based on. Unlike maps/cartograms of trade and postal routes created by Aloys Sprenger (Die Post- und Reiserouten des Orients, Leipzig 1864) and Guy Le Strange (The Lands of the Eastern Caliphate, Cambridge 1905), who connected locations with straight lines, Cornu’s maps offer realistic routes. []
    3. The gazetteer was published in three gradually updated versions. []
    4. In this regard, maps from Brill’s An Historical Atlas of Islam (1981, 2002) are not suitable for this task, since they lack information on projection, and do not provide values for the coordinate grid, which significantly affects the precision of georeferencing. See this example: A georeferenced map of Iran in the 4th-5th / 10th-11th Centuries. NB: Routes are straight lines between two points; georeferenced in QGIS. []

    0 0

    While looking for a way to identify all biographical collections and chronicles (and, by extension, all other texts that offer data for time-series analysis) in a collection of 0ver 10,000 texts, it occurred to me that all these texts share the same common feature—they are teeming with dates. So, what if we try to identify such texts computationally?! Not only will this help us to find all relevant titles in the sea of text—without overlooking or missing anything!—we, arguably, can get an insight into the chronological coverage of each of those titles, the chronological focus of individual historians, the chronological coverage of the entire collection of historical texts, and identify texts that focus on particular periods. The blogpost begins with an overview of several digital collections and then explains the methodology of the experiment. Appendices offer one to explore the chronological coverage of about 1,000 individual texts as well as the coverage of particular periods (here, hijri centuries—i.e., which texts focus on particular periods).


    Introduction

    Digital collections of classical Arabic texts have mushroomed over the past decade and a half. The three major libraries—al-Ǧāmiʿ al-kabīr (HDD), Shamela.ws, ShiaOnlineLibrary.com—include over 10,000 titles. There is probably another dozen collections that offer texts in hundreds and thousands (for example, Alwaraq.net, Waqfeya.com, NoorLib.ir, GhBook.ir, Lib.Eshia.ir, Library.Tebyan.net, HathiTrust.org, Archive.org).

    ShiaOnlineLibrary.comShamela.wsal-Ǧāmiʿ al-kabīr118501,689365ShiaOnlineLibrary.com: 1,810 titlesShamela.ws: 5,999 titlesal-Ǧāmiʿ al-kabīr: 2,364 titlesUNIQUE: 7,895 titles (~1,1 billion words)
    Overlap among collections. There is significant overlap among available digital collections. Thus, while their cumulative volume may run into tens of thousands, the count of unique titles—excluding the exact copies and texts based on different editions—is significantly lower. Additionally, it is very difficult to identify duplicates among the collections. The Venn diagram above shows the overlap—over 2,000 titles—among the three major collections (the count it still work in progress). NB The diagram generated with Ben Frederickson’s code.

    The number of these collections appears to be growing and their content expanding. This new research environment offers scholars an opportunity to check whether a particular text is included into in a certain collection, to browse and read it—often in a page-by-page manner—and to search for particular bits of information. These collections work well for looking for something that we know or expect to find—a book, a person, an event, a term. What we cannot do is to look into how books are related, how they overlap and complement each other; how each individual fits among his contemporaries as well as his predecessors and successors; how different historical events are intertwined; how terms, notions and concepts are related to each other and evolve across time and space. Yet, having full texts of our sources at our disposal, we can definitely go beyond simplistic linear searches. By asking a series of interconnected questions—and relying on digital methods of text analysis—we can move toward a new understanding of the entire Arabic written tradition (starting, of course, with what is digitally available in one form or another).

    The question of chronology is one of such foundational questions. What I offer in this experiment is to explore the content of three such collections in order to understand better the chronological coverage of each collection, each author, and each book. In order to get insights into these issues we can turn to different kinds of data. To get a perspective on the scope of each collection we shall start with looking into descriptions of books and their authors. More specifically—into when authors died.

    Metadata

    While metadata in most collections is not complete, it can still be quite useful. Major digital collections—al-Ǧāmiʿ al-kabīr (HDD), Shamela.ws, and ShiaOnlineLibrary.com—display the same clear trend: strong emphasis on the period from the 3rd–6th centuries AH (912–1203 CE), with an extra peak in the 8th century (1300–1397 CE), a steady decline during the 9th–12th centuries AH (1494–1785 CE), a slow recovery during the 13th century AH (1785–1882 CE), and skyrocketing in the 14th century AH (1882–1979 CE).

    Note on graphs. Data points of each graphed line show frequencies for periods of time that end at that point. For example, on the graph below that shows distribution of data by 100 lunar years (titles in al-Ǧāmiʿ al-kabīr), the value for 300/912 CE is 280, which means that there are 280 titles written by authors who died during 200–300 AH / 815–912 CE. A “step-before” type of graph displays such data most appropriately, but it is not suitable for comparative graphs, since there is too much overlap among the lines which makes the entire graph unreadable. Data on the most recent authors (after 1400/1979 CE) is excluded from the graphs, since it tends to overshadow earlier periods.

    al-Ǧāmiʿ al-kabīr (HDD) has the most complete chronological metadata on its authors.
    Shamela.ws (online). Almost half of its metadata do not have chronological metadata.
    ShiaOnlineLibrary.com (online). The collection has a rather complete chronological metadata. Almost 1/3 of all titles are books of modern Šīʿīte scholars (excluded from the graph so that they do not overshadow earlier periods).
    Alwaraq.com (online) has the most incomplete metadata, but it still suggests the same trend.

    The developers of these collections were most interested in the early Islamic period (roughly the first half of the first Islamic millennium). According to the data of such sources as the Hadiyyaŧ al-ʿārifīn by Ismāʿīl Bāšā al-Baġdādī (d. 1338/1919 CE), a bibliographical collection that builds upon the famous Kašf al-ẓunūn of Ḥāǧī Ḫalīfaŧ (d. 1067/1656 CE), and Ḫizānaŧ al-turāṯ, a Saudi catalog of manuscripts (al-Riyāḍ: Šarikaŧ al-ʿArīs lil-Kumbiyūtir, 2007), the number of contributors to the Islamic written treasury is continuously growing at least up until the beginning of the 13th century AH.

    The “growth” of authors, according to the data from the Hadiyyaŧ al-ʿārifīn and the Ḫizānaŧ al-turāṯ.

    Ḫizānaŧ al-turāṯ is a Saudi catalog of manuscripts that was first published on a CD (al-Riyāḍ: Šarikaŧ al-ʿArīs lil-Kumbiyūtir, 2007); currently its full text is included into Shamela.ws. The catalog includes over 160,000 records, but unfortunately suffers from a number of problems, such as inconsistency of typing conventions, duplicate records, selective coverage of different manuscript collections (for example, only about 1,000 Arabic manuscripts from St.Petersburg, Russia are covered, while St.Petersburg academic institutions house at least 11,000 Arabic manuscripts).

    Even though existing digital collections often awe us by their volume, the comparative graphs below shows that they cover only a fraction of the Arabic written tradition—even by comparison with an early 20th-century bibliography, which itself is hardly complete in its coverage. Additionally, the graphs also clearly highlights the fact that the chronological coverage of these collections is skewed heavily in favor of the earlier period of Islamic history.

    Chronological distribution of book titles in the Hadiyyaŧ al-ʿārifīn, Shamela.ws, al-Ǧāmiʿ al-kabīr (HDD), and ShiaOnlineLibrary.com.
    Chronological distribution of book titles in the Hadiyyaŧ al-ʿārifīn, Shamela.ws, al-Ǧāmiʿ al-kabīr (HDD), and ShiaOnlineLibrary.com.

    A note on the Hadiyyaŧ al-ʿārifīn. The decline of both graphs after 1200/1785 CE indicates unavailability of bibliographical information to the author more than anything else. The geographical coverage of the collection starts shrinking roughly at the same period. It should be noted that most chronological datasets exhibit a similar trend. For example, the trend can be observed in al-Ḏahabī’s own Ḏayl to his Taʾrīḫ al-islām, where the number of biographies drops dramatically; one can equally see the same trend in Brill’s Index Islamicus and Harvard Open Metadata (on 12 million books). The only difference is that the lag gets shorter as we get closer to our time—for premodern Arabic sources this lag is 100 to 150 years; in modern datasets—10 to 20 years.

    Another way to evaluate chronological coverage is too explore the actual texts. Ideally, the number of discrete units of information—such as, for example, biographies and events—by periods should show the distribution of chronological emphasis of a particular source. Furthermore, the summary of such data from all [available] titles written by a specific author should indicate this author’s interest in specific periods. (The interpretation of such “interest” is a different subject altogether. For example, the fact that the Hadiyyaŧ al-ʿārifīn has more information on the 11th and the 12th centuries AH (1591–1785 CE), may indicate either Ismāʿīl Bāšā al-Baġdādī’s interest in this particular period, or the availability of information for this period, or the genuine growth in numbers of people contributing to the Islamic written treasury.)

    Date Statements

    Almost none of the texts, however, are tagged in a manner that would allow to do such a detailed evaluation. Yet, it is possible to analyze date statements in each texts and offer an evaluation of their chronological coverage based on the frequencies of references to different periods. The consistency of date statements in Arabic texts—essentially, a word for “year” (ʿām or sanaŧ) followed by either digits or spelled-out numbers—makes it possible to represent this pattern with a regular expression, a special text string for describing a search pattern (see Figure below). This regular expression can be worked into a script, with which one can check available texts. It should be noted, of course, that this approach is tuned to analyze hiǧrī dates, since other dating systems are used only infrequently.

    Words sanaŧ and ʿām in the histories of Islam. Overall, the word sanaŧ is used most frequently in date statements: of about 1,362,000 date statements from across 10,000 texts only 2.9% of statements start with the word ʿām (~40,000), while 97.1% begin with the word sanaŧ (~1,322,000). Closer look also reveals that the word ʿām is favored in texts written in the 20th century; with regards to premodern texts, it can be said that authors from the western part of the Islamic world—al-Andalus and al-Maġrib—tend to use it more frequently, than their eastern counterparts.

    Note: Adding “in,” into the mix changes the picture into: of about 1,670,000 statements, 79.2% start with sanaŧ (~1,322,000), 18.5% with (~308,000), and 2.4% with ʿām (~40,000). The problem is that even a quick look at the ngrams of -statements—the words that immediately follow each -statement—shows that more than a half of these statements are quantitative phrase of different kind (for example, fī arbaʿ mujalladāt). For this reason, -statements are excluded from the analysis.

    [Top] A regular expression for capturing year statements in premodern Arabic sources. You can copy it and test it on some text. [Bottom] The image demonstrates this regular expression highlighting year statements (bright green) in the Taʾrīḫ al-islām of al-Ḏahabī (d. 748/1347 CE). Program used: EditPad Pro.

    Such an approach is not without its problems, of course, but it may serve well as an exploratory technique. The results of the experiment are intriguing in a number of ways, even though not entirely consistent. The most important outcome is that it allowed to discover that the collection of 10,000 texts contains only about 785 texts with more than 100 date statements per text (and since the included collections overlap, the number of unique titles is even smaller). Needless to say, that working with 785 texts is significantly easier than working with 10,000 titles. Additionally, frequencies of date statements for each text offer an opportunity to focus one’s efforts on texts that contain most data suitable for time-series analysis.

    Choronolgical coverage. The graphs show the chronological coverage for the same text generated with two different approaches: while the orange dotted line represents the ideal situation—data collected through the manual tagging of the entire source, the blue solid line represents the only realistic situation—data extracted computationally. While the absolute results differ, the relative distribution is very similar and emphasizes the same periods. On the problem of the 1st century AH (622–718 CE) see below.

    The graph above shows two different representations of the chronological coverage of the Hadiyyaŧ al-ʿārifīn by Ismāʿīl Bāšā al-Baġdādī (d. 1338/1919 CE), a bibliographical collection that builds upon the famous Kašf al-ẓunūn of Ḥāǧī Ḫalīfaŧ (d. 1067/1656 CE). The blue line shows the frequencies of date statements by periods (binned into 50 year periods)—strongly suggesting more emphasis on the 11th an 12th centuries AH (1591–1785 CE). The orange dotted line shows the distribution of biobibliographical records on about 8,800 authors—this actual distribution of discrete information units in the source emphasizes the same period of the 11th and 12th centuries. The similarity in the patterns of distribution shows that reliance on computationally extracted date statements is a viable alternative.

    The 1st Century Problem

    Unfortunately, many texts suffer from what can be characterized as “the 1st century problem”: authors often drop hundreds from date statements (authors from the second millennium also tend to drop thousands), which leads to a very high number of date statements referring—at the face value—to the 1 st century AH (622–718 CE). As a result, the 1st century often gets inflated, overshadowing other periods. The graph below illustrates this issue.

    Since authors often drop hundreds from their date statements, the 1st century AH gets overinflated. As the title suggests, al-Saḫāwī’s (d. 902/1496 CE) al-Ḍawʾ al-lāmiʿ li-ahl al-ḳarn al-tāsiʿ focuses on the 9th century AH (1397–1494 CE), but—as the graph above shows—the number of date statements referring to the 8th (1300–1397 CE) and 9th (1397–1494 CE) centuries is significantly smaller than of those referring to the 1st century (notice the gap in between!). It is clear that al-Saḫāwī is dropping hundreds from his date statements. The problem is that some of those statements may refer to the 8th century, while some others to the 9th, so moving them all to the 9th century is hardly a solution.

    The problem may be resolved through the sequential analysis of date statements in texts. Authors are not likely to drop hundreds from their statements without letting their readers know what century they are talking about. In other words, an incomplete date statement must be preceded by a complete one. Thus, one can check if there are other date statements—and if there is, the incomplete date can be fit into the period of the preceding statement.

    The actual implemented algorithm grabs a 100-word chunk before a 1st-century date statement and checks if there are other date statements in that chunk. The procedure is repeated up to five times, that is checking up to 500 words—an equivalent of 1 to 3 printed pages—before the date statement in question, until either the text limit is reached or a date statement found. If a date statement is found, its century gets applied to the starting date statement that we treated as incomplete. In other words, if we start with “the year 65”, and we find “the year 530” preceding it, we change the first date into “the year 565” (1169 CE). If the preceding date is also from the 1st century, the starting date remains unchanged; the date also remains unchanged, if no other date statements have been found. Additionally, the algorithm runs in two different ways—in the first case, it does not build on updated date statements (Lines B); while in the second, it does, extrapolating from corrected date statements (Line C). The graph below shows the results.

    The graph shows new results for al-Saḫāwī’s (d. 902/1496 CE) al-Ḍawʾ al-lāmiʿ li-ahl al-ḳarn al-tāsiʿ: A (solid blue line) shows unmodified date statements (as in the previous graph); B (dotted orange line) shows the results of the first run of the algorithm—over 2,800 statements were updated, but there is still a lot of dates for the 1st century; C (dashed green line) shows the results of the second run of the algorithm, which builds on the updated dates—almost 12,000 date statements were redistributed, now clearly showing that the book is about 9th century.
    Note: a6675 is the identifier of a particular version of the text—title #6675 from al-Maktabaŧ al-Šāmilaŧ; the same title from a different collection will have a different identifier.

    The question is, of course, how reliable such projections are. In order to check this we need to compare algorithmically produced results with manually disambiguated data. The graphs below show such comparisons for four different sources: A (orange dotted) shows the initial results of computational date statements collection; B (green dashed)—modified dates without extrapolation; C (red dashed)—modified results with extrapolation; and, finally, D (blue solid)—shows manually disambiguated 1st-century date statements.

    al-Wafayāt al-aʿyān of Ibn Ḫallikān (d. 681/1282 CE)

    Results for Ibn Ḫallikān’s al-Wafayāt al-aʿyān are very good—algorithmically modified dates are very close to manually disambiguated. Results of Algorithm B—modified results without extrapolation—are slightly closer to the benchmark (line D) than the results of Algorithm C. Yet, both are somewhat “overfitting” 1st-century dates. Good news: algorithmic lines B and C lead to the same conclusion as the benchmark Line D—Ibn Ḫallikān covers the period of 450–650 AH / 1058–1252 CE most thoroughly.

    al-Kāmil fī-l-taʾrīḫ of Ibn Aṯīr (d. 630/1232 CE)

    Results for Ibn Aṯīr’s al-Kāmil fī-l-taʾrīḫ are less precise: both algorithms overfitted 1st-century dates, inflating other centuries, if compared to manually disambiguated data (D). The peaks of distribution—the shape of the curve—are much closer to the benchmark than the preprocessed results (A), but computational analysis suggests that Ibn Aṯīr focuses more on the later period, while (according to manually disambiguated data) his attention is spread more evenly.

    Ṭabaḳāt al-šāfiʿiyyaŧ of Ibn Ḳāḍī Šuhbaŧ (d. 851/1447 CE)

    Results for the Ṭabaḳāt al-šāfiʿiyyaŧ of Ibn Ḳāḍī Šuhbaŧ are not ideal, but still much better than the initial results. Extending the check range from 500 words to 1,000 gets the graph—line C in particular—much closer to the benchmark (click on the image to see the graph based on the extended range of 1,000 words). The problem, however, is that for other sources 1,000-word range does not generate better results.

    Some general observations

    We are clearly not getting 100% match with the benchmark, but that is not to be expected anyway—none of the exploratory computational methods work that way. Our model does not take into account the stylistic differences among authors. While the ballpark of date statements do fall into the proposed pattern there are occasionally slight variations that are peculiar to particular authors. Some of such peculiarities may be helpful. For example, Ibn Ḫallikān often uses phrases li-l-hiǧraŧ or min al-hiǧraŧ with the true 1st-century date statements (which is still 75-80%)—and such markers can be worked into the algorithm; other authors—about half a dozen that I checked thoroughly—use such additional phares only occasionally. Other peculiarities are too complicated and cannot be resolved with simple algorithms. For example, Ibn Ḳāḍī Šuhbaŧ occasionally “spells” out ones in his date statements to ensure that his readers get it right: sanaŧ sabʿ bi-taḳdīm al-sīn wa-ʿišrīn …, “the year seven, with sīn in the beginning…”), which, again, breaks the general pattern for date statements. The most complicated issue, however, is that even for a scholar it may occasionally be difficult to figure what century a certain date refers to (for example, when a biographee was born close to the middle of one century and died close to the middle of the next one). Natural languages will always pose such difficulties, yet, the results produced with the offered approach are quite suitable for the goal: even when we do not get the exact results, we are still getting close enough to the benchmark for a useful distant reading of a large corpus.

    The precision of results also varies because of differencies in book structure. We get more precise projections for books organized alphabetically—in this case authors cannot afford to use too many incomplete dates (see graphs for the Hadiyyaŧ al-ʿārifīn and Wafayāt al-aʿyān above); and less precise for books organized chronologically. It would make sense to develop different subroutines for processing texts based on their organization. Having robust metadata on each text would help triggering analytical routines adjusted to various peculiarities, although the structure of a book can be inferred computationally (on this see below). Additionally, a more precise logic can be implemented if our texts are properly divided into logical units. Thus, in a book organized alphabetically, the analysis of dates would be limited to a single logical unit, while in a book organized chronologically the precision of analysis can be inforced by looking into date statements in the neighboring units. At this point, results are provocatively suggestive—but in most cases some familiarity with a specific book will help make sense of its graphs.

    Complementary coverage of “continuations”

    Date statements may also offer other useful insights into Arabic historical sources. Comparing chronological coverage of different texts may offer an illustration of how text related to each other. Graphs below show a few examples of how certain texts are overlapping chronologically with their “continuations” (ḏayl, takmilaŧ, ṣilaŧ) and are complemented by them.

    Complementary coverage of “continuations”. [Top left] al-Ḏahabī’s Taḏkiraŧ al-ḥuffaẓ and its three ḏayls. [Top right] Ibn Abī Yaʿlá’s Ṭabaḳāt al-ḥanābilaŧ continued by Ibn Raǧab’s Ḏayl ʿalá Ṭabaḳāt al-ḥanābilaŧ. [Bottom left] Ḥaǧǧī Ḫalīfaŧ’s Kašf al-ẓunūn continued by Ismāʿīl Bāšā al-Baġdādī’s Iḍāḥ al-maknūn fī ḏayl ʿalá Kašf al-ẓunūn. [Bottom right] al-Ḫaṭīb’s Taʾrīḫ Baġdād continued by Ibn Naǧǧār’s Ḏayl (excerpted by Ibn al-Dimyāṭī in his al-Mustafād min Ḏayl Taʾrīḫ Baġdād).
    Complementary coverage of “continuations.”Taʾrīḫ mawlid al-ʿulamāʾ wa-wafayati-him of Ibn ʿAbd Allãh al-Rabaʿī (d. 397/1006 CE) is another interesting example, since we have its “continuation”, Ḏayl taʾrīḫ mawlid al-ʿulamāʾ wa-wafayati-him of ʿAbd al-ʿAzīz al-Kattānī (d. 466/1073 CE), and “the continuation of the continuation”, Ḏayl ḏayl taʾrīḫ mawlid al-ʿulamāʾ wa-wafayati-him of Hibaŧ Allãh al-Akfānī (d. 524/1130 CE). The graph vividly demonstrates how these collections complement each other chronologically.

    Date statements and the structure of books

    Patterns of date statements distribution across texts—in other words, if we graph dates in the order they occur in a text—can also tell us a lot about the structural organization of books. As the illustrations below show, alphabetical and chronological structures have distinct visual patterns. Such patterns can be helpful in assessing new corpora and identifying texts relevant for specific research purposes. Different routines can be developed for the identification and analysis of texts of other forms and genres.

    Note on graphs below: Each line represents a date statement, where the length of the line corresponds to the year that a date statement refers to. The left side of each graph is the beginning of the book; the right one—its end. Regression analysis—here visualized with the red line for linear regression, and the blue one for LOWESS regression—can be used for identifying the patterns of distribution without graphing. (1st-century dates were removed to make patterns more clear.)

    Distribution of dates across historical texts: Dates in the Taʾrīḫ Dimašḳ (top) are randomly distributed across the entire length of the text, which corresponds to its alphabetical organization; the same pattern can be seen in the al-Wāfī bi-l-wafayāt (bottom), which is also organized alphabetically.
    Distribution of dates across historical texts: Dates in the Taʾrīḫ al-islām, which covers the period of Islamic history up to 700/1300 CE, display a clear rising pattern, which reflects its chronological organization.
    Distribution of dates across historical texts: Dates in the Hadiyyaŧ al-ʿārifīn display a zig-zag pattern, which reflects its alphabetical organization, where biobibliographical records within each letter are organized chronologically (This last thing was quite a discovery—even though I have spent quite a lot of time working with this text, I did not realize that biographies within each letter are organized chronologically until I saw this graph).

    Concluding remarks

    One thing that must be voiced is that if we had a corpus properly prepared by scholars and for scholars that would include robust metadata and texts tagged into logical units, the results of such an experiment would have been significantly more precise and reliable, not to mention that such a corpus would also allow to run a number of other exploratory experiments. To put it differently, we—scholars who study the premodern Islamic world, and who are actively using collections developed in Arab countries and Iran for non-academic purposes (and let’s be honest, most of us do)—must invest time and effort into the development of a digital library that would allow all of us to engage in methodologically novel research. Such a library would also allow to build on the each other’s research more consistently, which would also help to forge a new collaborative culture that will be beneficial to the entire field.

    Appendix I: Exploring coverage of historical sources

    You can explore the chronological coverage of historical texts using Chronoplot (it may take a moment to load). Current data includes about 3,000 texts (including versions of the same text from different libraries). Keep in mind the following:

    1. Each text has a unique identifier: letter + number, where the former refers to a collection, and the latter—to the number of a text in that collection:
    2. Each text has three variations of date statement distribution. (Consider comparing variations for the text with the same identifier.) Texts of the same title from different collections occasionally give different distributions (especially when electronic texts are based on different printed editions).
      • A— unmodified dates (“1st century problem”);
      • B— updated dates (“single pass”);
      • C— updated dates (”double pass”)
    3. Selector (right) can be used to select titles for graphing their chronological coverage. Choosing multiple titles will allow to compare their coverages.
    4. Filter (right top) can be used to find specific titles: type a part of an author’s name or a book’s title, and the list will be filtered to show only items that have your keywords.
    5. Linetype (right bottom) is a drop-down menu that offers several ways graphing the results. The most appropriate linetype for displaying chronological coverage is “step-before,” since it shows the frequencies of date statements per 50-year periods in the most clear manner. However, this works well only for single texts. For comparative purposes “monotone” seems to be a better option.

    Appendix II: Exploring coverage of historical periods

    The table below lists sources by frequencies of date statements. Like Chronoplot, this table also has three variations of each text (A, B, C). Since variations A, B, and C differ only in how dates are distributed across periods, the initial table shows only variation A. Selecting a particular century will show only texts (with variations) that have dates for those periods.

    Metadata on texts is not always complete. The missing information may be available online—where applicable, links to the online manifestations of texts are provided.

    By centuries:



    0 0

    This blogpost overviews existing collections of scanned editions of Arabic texts that can be found online. Each collection is described in the same manner in order to provide ground for comparison.

    al-Maktabaŧ al-Waqfiyyaŧ

    The largest collection of scanned books in Arabic (scanned images compiled into PDFs) with over 7,200 titles (~12,000 volumes), including editions of most major classical Islamic sources. The website is regularly updated and is searcheable by titles, author names and some other parameters; it can be browsed by categories.

    Most texts are stored on archive.org and occasionally linked to Shamela. The collection also includes books in languages other than Arabic.

    Developers: waqfeya.com

    URL: waqfeya.com

    Other links

    http://digital.library.mcgill.ca/islamic_lithographs/— Islamic litographs @ McGill


    0 0

    Click on the image to download the Reader.

    Bringing DH methods into a language classroom

    Learning classical Arabic is a long process. Most of us took great pleasure in advanced reading classes with our professors, but, often struggling with an overwhelming volume of new vocabulary, we also—at least occasionally—had a feeling that a traditional method is not necessarily the most effective one. While advanced students usually overcome this difficulty by their sheer passion for the subject, the introduction of excessive vocabulary creates a serious obstacle to less advanced yet capable students.

    Pervasive availability of electronic texts and computational methods of text analysis allows us to rethink how we teach difficult languages. We can identify the most frequent features within a corpus and focus our attention on them. For example, the 100 most frequent lexical items constitute about 56% of the entire vocabulary of over 34,000 Prophetic sayings (ḥadīṯ) from the Six [Sunnī] Collections (al-kutub al-sittaŧ, approximately 2.8 million words). Relying on such data, one can generate a frequency-based reader that will introduce students to the shortest texts with the most frequent vocabulary and grammatical structures. With a paced increase in difficulty of texts and incremental expansion of vocabulary, students are capable of digesting much larger volumes of text both in class and at home, and such an extended exposure enables students to internalize the authentic language more efficiently. For example, in the course of one semester, we managed to cover about 400 ḥadīṯs, while at the same time reviewing the grammar of classical Arabic and having regular discussions of thematic readings that helped students to understand the cultural importance of the Ḥadīṯ across almost 14 centuries of Islamic history.1

    While developed primarily with classical Arabic in mind, the approach is actually universal and can be used for any language. It works best with serialized texts—that is a large corpus of relatively short text of the same type (in the case of Arabic that would be ḥadīṯ collections, chronicles, biographical collections, poetic anthologies, contemporary newspapers, etc.). Considering that in terms of vocabulary various forms and genres may differ from each other quite significantly (Figure 1 shows that such difference may go up to 80%!), this method can be used to introduce students to the language of particular genres in the most efficient manner. Courses based on such readers can be a valuable addition to any language program and will be particularly welcomed by graduate students who often face the need to develop their readings skills as quickly and efficiently as possible.

    Figure 1. The matrix shows lexical overlap across the frequency lists (top 3,000 items) that represent large thematic specimens of Arabic language. The specimens are arranged chronologically, staring with the earliest (right-top corner, 9th century) to the latest (20th century). The most dramatic lexical difference is between al-Kutub al-Sittaŧ, the Six [Sunnī] Collections of ḥadīṯs, and al-Šarḳ al-awsaṭ, the modern newspaper: the frequency lists of these two sources (again, top 3,000 items) share only 20% of word forms (tokens). Even among the “classical” works the lexical distance is quite significant, with the percentage of shared vocabulary fluctuating mainly between 38% and 58% (for the interquartile range).

    Texts compared: al-Kutub al-Sittaŧ (2,8 mln. words), the 6 Sunnī collections of Ḥadīṯ (~9th century CE); Tafsīr al-Ṭabarī (or Ǧāmiʿ al-bayān, 3 mln. words), a commentary to the Qurʾān of al-Ṭabarī (d. 310/922 CE); Kitāb al-Aġānī (1,5 mln. words), a poetic anthology of Abūl-l-Faraǧ al-Iṣbahānī (d. 356/967 CE); al-Futūḥāt al-Makkiyyaŧ (1,7 mln. words), an extensive Ṣūfī text of Ibn al-ʿArabī (d. 638/1240 CE); Fatāwá Ibn Taymiyyaŧ (2,9 mln. words), a collection of legal decisions and epistles of Ibn Taymiyyaŧ (d. 728/1327 CE); Taʾrīḫ al-Islām (3,2 mln. words), a biographical collection and chronicle of al-Ḏahabī (d. 748/1347 CE); Maǧallaŧ al-Risālaŧ (16 mln. words), an early 20th-century Egyptian literary journal; Tafsīr al-Mīzān (2,3 mln. words), a modern Šīʿī commentary to the Qurʾān of al-Sayyid al-Ṭabāṭabāʾī (d. 1981 CE); and al-Šarḳ al-Awsaṭ (2,5 mln. words), a modern Arabic newspaper (collected by Tariq Yousef from http://aawsat.com/).

    Description of the method

    The overall procedure is rather simple and runs as described below.

    Step I. Ḥadīṯ collections were downloaded from http://sunnah.com/. Then, initial texts were reformatted and normalized.2 (There are multiple way how specimens of other genres can be obtained and the processed for a similar reader).

    Step II. All vocabulary from the corpus was collected and converted into a frequency list. This list was then converted into a ranking list, where the most frequent item receives rank 1, the second—2, the third—3, and so on; items with the same frequency are assigned the same rank. It should be noted that vocabulary items have not been parsed with a morphological analyser, so different forms of the same word are treated separately (i.e., ḳāla, ḳīla, ḳālat, fa-ḳāla, etc. have their own frequencies and ranked separately). The main reason for not using the results of automatic morphological analysis is largely technical, since existing morphological analyzers are meant to work with modern standard Arabic and do not perform well on classical Arabic.3 At the same time, using frequencies of word forms (tokens) rather than dictionary forms (lexemes) has its advantages, since more frequent forms will be given more frequently in the reading materials (such as, for example, very frequent ḳāla [sing. masc.] vs. rather rare ḳālā [dual masc.]).4

    Step III. The average mean of ranking values was calculated for each ḥadīṯ. The resultant values then served as difficulty indices, where texts with the most frequent vocabulary would have the lowest average means, and vice versa. These indices were then used as sorting values that allowed rearranging all 34,000 ḥadīṯs by the difficulty of their vocabulary. The advantage of the average mean here is that even a single low frequency lexical item increases the difficulty index of a text, which is pushed down the list. This approach turned up a couple of unforeseen positive effects. First, as the length of a text increases so does the probability of more rare lexical items—as a result, the “easiest” texts are also the shortest ones. This convenient outcome allows students to begin with the shortest texts and move gradually to the longer ones. The second effect is that the most frequent vocabulary also tend to appear in the most frequent grammatical and syntactic structures.

    Step IV. The rearranged collections of ranked ḥadīṯs was not quite useable, since this method also groups together items that are almost the same. Here manual input was required to exclude ḥadīṯs that are too similar.

    Step V. At last, the selection of ḥadīṯs was converted into format and typeset into the reader in front of you. As you will see, quite a few ḥadīṯs in the beginning of the reader feature only isnāds, “the chains of transmitters”, and do not have matns, the actual texts of ḥadīṯs. I used these matn-less ḥadīṯs to introduce students to the concept of transmission of knowledge in Islamic culture, which most were not familiar with; next time around I will modify the reader to avoid having very similar texts next to each other, which can be done by the retagging of the selection of ḥadīṯs and regenerating the entire reader anew.

    In the classroom

    In my teaching, I used this reader in combination with ‘micropublications’, which provided each student with a thorough practice of foundational skills necessary for mastering the language: for each ḥadīṯ students provided full vocalization, morphological stemming, and translation aligned with its Arabic original. Such ‘micropublications’ help monitoring students’ progress, and, later, can be used to automatically grade such assignments, thus freeing up time for in-class discussions. Last but not least, by producing these micropublications, students make a valuable contribution as they generate training data that can be used for various teaching and research purposes.

    Footnotes

    1. “Classical Arabic through the Words of the Prophet” (Tufts University, Winter/Spring 2015), with the following two additional readings: W. M. Thackston, An Introduction to Koranic and Classical Arabic: An Elementary Grammar of the Language (Bethesda, Md.: Ibex Publishers, 2000), Jonathan Brown, Hadith: Muhammad’s Legacy in the Medieval and Modern World (Oxford: Oneworld, 2009).

    2. On normalization, see: Nizar Y. Habash, Introduction to Arabic Natural Language Processing ([San Rafael, Calif.]: Morgan & Claypool Publishers, 2010), 21–23.

    3. For example, Buckwalter Morphological Analyser, which has been tested with this corpus (using Perseus morphological services), returned no results for about 25% of tokens, single results for another 25%, and more than one for the rest 50%. Needless to say, such results are hardly useable for our purposes.

    4. An ability to recognize rare forms is important, of course, but it can be practiced through grammatical and morphological exercises (examples can be found at the end of the reader).


    0 0

    My dissertation—“Computational Analysis of Arabic Biographical Collections with Special Reference to Preaching in the Sunnī World (661-1300 CE)”—is now available online through the digital library @ the University of Michigan. Even with very extensive Appendices, several thousand graphs and maps still did not make it into the dissertation. Hopefully, if I can find enough time, I will make an online appendix with the visualizations of all generated data that consists mainly of chronological graphs of “descriptive names” and chronological maps that show how their geographies were changing over time (all based on “The History of Islam” of al-Dhahabī (d. 1348)).

    ##Abstract

    A project in the digital humanities, the dissertation explores methods of computational text analysis. Relying on text-mining techniques to extract meaningful data from unstructured text, the study offers an effective and flexible method for the analysis of Arabic biographical collections, the most valuable source for the social history of the pre-modern Islamic world. It uses the largest collection, “The History of Islam” of al-Dhahabī (d. 1348), as a case-study of applying the new method and shows how almost 30,000 biographies can be studied as a whole. A step toward finding a viable solution for studying the entire digital corpus of classical Islamic texts (400 mln. words), Chapter I offers a detailed explanation of “computational reading” that was built upon existing digital approaches from a variety of disciplines. Chapter II models big data extracted from the main source to further our understanding of the social geography of the Islamic world and its major social transformations, simultaneously providing an important background for the next chapter. Chapter III applies the devised method to the study of Islamic preaching from chronological, geographical and social perspectives that have been overlooked in the academic treatment of this subject. Largely an exploratory overview, it traces long-term changes in preaching practices as well as statuses of preachers within the Islamic elites. This chapter demonstrates how exactly computational reading can contribute to the studies of specific phenomena and practices. The final section overviews broad prospects of the further application of “computational reading” to a variety of genres of pre-modern Arabic literature. The dissertation heavily relies on the visual display of information in the form of graphs, charts, maps, and tables that are used in the main body and supplied in Appendices.


    0 0

    By: Maxim Romanov, Matthew Thomas Miller,
    Sarah Bowen Savant, and Benjamin Kiessling



    The OpenITI team—building on the foundational open-source OCR work of the Leipzig University’s (LU) Alexander von Humboldt Chair for Digital Humanities—has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines (~400 pages, 87,000 words). These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR options for classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software called Kraken (developed by Benjamin Kiessling, LU), thus enabling us to make this Arabic-script OCR technology freely available to the broader Islamic, Persian, and Arabic Studies communities in the near future. Unlike more traditional OCR approaches, Kraken relies on a neural network—which mimics the way we learn—to recognize letters in the images of entire lines of text without trying first to segment lines into words and then words into letters. This segmentation step—a mainstream OCR approach that persistently fails on connected scripts—is thus completely removed from the process, making Kraken uniquely powerful for dealing with a diverse variety of ligatures in connected Arabic script. In the process we also generated over 7,000 lines of “gold standard” (double-checked) data that can be used by others for Arabic-script OCR training and testing purposes.

    Our working paper can be found on Academia.edu.



    Kraken ibn Ocropus. Based on a depiction of an octopus from a manuscript of Kitāb al-ḥašāʾiš fī hāyūlā al-ʿilāj al-ṭibbī (Leiden, UB : Or. 289); special thanks to Emily Selove for help with finding an octopus in the depths of the Islamic MS tradition.