The description is submitted to:


Session R4830 ("Can You Read This One?" New Advances and Issues in Digital Recognition of Printed/Handwritten Arabic Characters), 2017
In contributing to the round-table we will share our experience in developing an optical character recognition based search system for printed text in Ottoman script and discuss what lies ahead in the development of an intelligent character recognition (ICR) system, a transcription engine and other natural language processing tools for Ottoman script.

We will focus on two fundamental challenges: (1) The Ottoman script is not active language because the Republic of Turkey adopted the Latin alphabet in 1928. (2) There is a significant lack of resources for technology development. Such challenges mean that there is no suitable corpora for Ottoman on which natural language processing tools could be built. So, if one wants to build such tools, one has to build the resources, too.

We will also discuss how other historical factors, such as the use of printing press after 1729, help us with our development efforts. Although printing press became more effectively used almost a century later by early 19th century, it was abolished after another century. The brief printing effort limited the number of printing presses, and the common set of typefaces they utilized (mostly Naksh).

Our contribution will focus on the technological and other historical issues relevant to the development of technologies for the Ottoman script.