Unicode and Arabic Script: Historical Legacies and Future Challenges

By J.R. Osborn
Submitted to Session P4952 (Arabic-script typography: history, technology, & aesthetics, 2017 Annual Meeting
All Middle East;
Information Technology/Computing;
This paper analyzes the handling of Arabic script within the Unicode Standard. First proposed in 1991, Unicode is a foundational standard of the global Internet. It assigns a unique code point to every letter, character, or symbol of every commonly shared writing or notational system. Prior to the spread of Unicode, a diversity of encoding schemes proliferated. This resulted in typographic and textual confusion, when, for example, an email encoded in Arabic script—or another non-Latin script—appeared as nonsensical gobbledygook when opened on a computer that utilized a different encoding scheme. By standardizing all the world’s scripts within a consistent encoding scheme, Unicode greatly facilitated the exchange of multilingual texts and multi-script typography.

Nevertheless, Unicode remediates the history of typography. The legacies of moveable metal type, digitized Latin script, and the American Standard Code for Information Interchange (ASCII) continue to shape international computing. The Unicode Standard assumes that linguistic and written characters possess certain properties, many of which are challenged by the structure of Arabic script. Unlike Latin type, Arabic script has a large number of contextual character shapes, it is necessarily cursive, it layers a wide range of optional diacritics above and below the line of primary text, and it flows from right to left. Incorporating these features into the Unicode standard challenged the digital dominance Latinate typography.

Drawing upon Arabic script examples, this paper discusses how formal typographic structures are encoded digitally in Unicode. It analyzes the organization of the basic Arabic Unicode block (defined as U+0600 through U+06FF), the Arabic supplemental and extended blocks, and the blocks of Arabic presentation forms. It also highlights Arabic structures and historical usages that are excluded from, or difficult to represent, in Unicode. Analyzing Unicode from the perspective of non-Latin writing systems raises pertinent questions about cultural participation and technical representation in a globalized world. Digital solutions to the “challenges” of Arabic script are increasing applied beyond the script itself; they expand the typographic possibilities of other scripts, including Latin. Unicode’s handling of Arabic script asks us to reassess historical legacies while suggesting new modes of digital typographic practice.