Looking for the author behind the words: Stylometric Analysis of al-Dhahabi’s (d. 1347) Writings

By Maxim Romanov
Submitted to Session P4945 (A New Corpus for the Islamicate World and Methods for Its Exploration, 2017 Annual Meeting
All Middle East;
7th-13th Centuries;
LCD Projector without Audio;
With about 50 titles, al-Dhahabi (d. 1347) is one of the most prolific Muslim authors. Not only was he prolific, his books are also among the longest in the treasury of Arabic written tradition, particularly his 50-volume “History of Islam” (Ta'rikh al-islam). This monster of a book is understood to be a compilation of earlier sources and our computational analysis of text reuse---identifying shared passages among texts---shows that from 20 to 40% of the volume of this book consists of quotations. The texts reuse detection method, however, allows one to identify quotations only through the direct comparison with the actual source of quotations. Stylometric approach offers a perspective that helps us to surpass this limitation. Closely associated with authorship attribution, stylometric analysis---particularly, rolling stylometry---allows one to identify text reuse through shifts and changes in the writing style, “the authorial fingerprint”, within the same book. And the application of this method does show that “al-Dhahabi’s” style in the early volumes is *completely* different from the style in the latest ones. Additionally, with our corpus of 4,300 Arabic texts, one can design a large-scale experiment to identify all possible sources of al-Dhahabi’s book through similarities in writing style, rather than through direct quotations. The presentation will begin with a brief explanation of the stylometric approach and will offer two experiments. The first experiment will focus on finding al-Dhahabi in al-Dhahabi’s writings through multifaceted comparison of all his available writings with each other. The second one---on possible sources of his “History of Islam” identified through the large-scale comparison and how the results of stylometric analysis compare with the results of text reuse detection method.