Topological Data Analysis (TDA) is a novel new and fast growing field of data science providing a set of new topological and geometric tools to derive relevant features out of complex high-dimensional data. In this paper we apply two of best methods in topological data analysis, "Persistent Homology" and "Mapper", in order to classify persian poems which has been composed by two of the best Iranian poets namely "Ferdowsi" and "Hafez". This article has two main parts, in the first part we explain the mathematics behind these two methods which is easy to understand for general audience and in the second part we describe our models and the results of applying TDA tools to NLP.
翻译:拓扑数据分析(TDA)是数据科学中一个新兴且快速发展的领域,它提供了一套全新的拓扑与几何工具,用于从复杂的高维数据中提取相关特征。本文应用拓扑数据分析中的两种核心方法——“持续同调”与“映射器”——对两位伊朗杰出诗人菲尔多西与哈菲兹创作的波斯诗歌进行分类。本文主要包括两部分:第一部分以通俗易懂的方式阐释这两种方法的数学原理;第二部分详细描述我们的模型,并展示将TDA工具应用于自然语言处理任务所取得的结果。