[P4945] A New Corpus for the Islamicate World and Methods for Its Exploration

Created by Sarah Bowen Savant
Monday, 11/20/17 3:30pm

SUMMARY:

The written heritage of the "Islamicate" cultures that stretch from modern Bengal to Spain is as vast as it is understudied and underrepresented in the Digital Humanities. The sheer volume and diversity of the surviving works produced in Persian and Arabic by denizens of these lands in the premodern period makes this body of texts ideal for computational forms of analysis. Efforts to utilize these new digital forms of macro-textual analysis and digital scholarship, however, have been stymied by the lack of a reliable corpus. In an effort to address this desideratum, an international group of scholars have created the first version of a the machine-actionable scholarly corpus of premodern Islamicate texts. This corpus currently includes 740 million words of 4,300 unique texts in Arabic and 9.3 million words of Persian and is already openly available. The panel will present the corpus to the field of Arabic and Persian studies, explaining how it can be used for various scholarly purposes and sharing the team's long-term vision of how to build the digital infrastructure for the computational study of Islamicate textual traditions. The participants also will present their own individual case studies of texts from the corpus, showcasing a series of digital methods of algorithmic text text analysis, which will include such approaches as text-reuse detection, stylometry, and topic modeling.

SPONSOR:

Middle East Medievalists (MEM)

DISCIPLINES:

Hist

ABSTRACTS:

MEMBERS:

Paul M. Cobb

(University of Pennsylvania)
Panel Participating Role(s): Discussant;

Sarah Bowen Savant

(Aga Khan University)
Panel Participating Role(s): Organizer; Presenter;
Image

Maxim Romanov

(Leipzig University)
Maxim Romanov is a Universitätsassistent für Digital Humanities at the Institute for History, University of Vienna. His dissertation (Near Eastern Studies, U of Michigan, 2013) explored how modern computational techniques of text analysis can be applied...
Panel Participating Role(s): Presenter;
Image

Nancy Khalek

(Brown University)
Nancy Khalek is Associate Professor of Religious Studies and specializes in Late Antiquity and medieval Islam.
Panel Participating Role(s): Chair;
Image

Matthew Thomas Miller

(Roshan Institute for Persian Studies, U of Maryland, College Park)
Panel Participating Role(s): Organizer; Presenter;

Elijah Cooke

(University of Maryland)
Panel Participating Role(s): Presenter;