Application of phylogenetic methods to textual traditions has traditionally treated all changes as equivalent even though it is widely recognized that certain types of variants were more likely to be introduced than others. While it is possible to give weights to certain changes using a maximum parsimony evaluation criterion, it is difficult to state a priori what these weights should be. Probabilistic methods, such as Bayesian phylogenetics, allow users to create categories of changes, and the transition rates for each category can be estimated as part of the analysis. This classification of types of changes in readings also allows for inspecting the probability of these categories across each branch in the resulting trees. However, classification of readings is time-consuming, as it requires categorizing each reading against every other reading at each variation unit, presenting a significant barrier to entry for this kind of analysis. This paper presents Rdgai, a software package that automates this classification task using multi-lingual large language models (LLMs). The tool allows users to easily manually classify changes in readings and then it uses these annotations in the prompt for an LLM to automatically classify the remaining reading transitions. These classifications are stored in TEI XML and ready for downstream phylogenetic analysis. This paper demonstrates the application with data an Arabic translation of the Gospels.
翻译:传统上,将系统发育方法应用于文本传统时,通常将所有变化视为等效,尽管人们普遍认识到某些类型的变体更有可能被引入。虽然可以通过最大简约性评估标准为特定变化赋予权重,但先验地确定这些权重的具体数值较为困难。概率方法(如贝叶斯系统发育学)允许用户创建变化类别,并可在分析过程中估计每个类别的转移速率。对读法变化类型的这种分类,还允许在生成的系统树中对各分支上这些类别的概率进行检验。然而,读法分类耗时费力,因为它需要在每个变异单元中将每个读法与其他所有读法进行归类,这构成了此类分析的重要门槛。本文提出Rdgai软件包,该工具利用多语言大型语言模型(LLMs)自动完成此项分类任务。该工具允许用户轻松手动分类读法变化,随后将这些标注作为提示词输入LLM,以自动分类剩余的读法转换。分类结果以TEI XML格式存储,可直接用于下游系统发育分析。本文以阿拉伯语福音书译本数据为例演示了该工具的应用。