The formulation of a claim rests at the core of argument mining. To demarcate between a claim and a non-claim is arduous for both humans and machines, owing to latent linguistic variance between the two and the inadequacy of extensive definition-based formalization. Furthermore, the increase in the usage of online social media has resulted in an explosion of unsolicited information on the web presented as informal text. To account for the aforementioned, in this paper, we proposed DESYR. It is a framework that intends on annulling the said issues for informal web-based text by leveraging a combination of hierarchical representation learning (dependency-inspired Poincare embedding), definition-based alignment, and feature projection. We do away with fine-tuning computer-heavy language models in favor of fabricating a more domain-centric but lighter approach. Experimental results indicate that DESYR builds upon the state-of-the-art system across four benchmark claim datasets, most of which were constructed with informal texts. We see an increase of 3 claim-F1 points on the LESA-Twitter dataset, an increase of 1 claim-F1 point and 9 macro-F1 points on the Online Comments(OC) dataset, an increase of 24 claim-F1 points and 17 macro-F1 points on the Web Discourse(WD) dataset, and an increase of 8 claim-F1 points and 5 macro-F1 points on the Micro Texts(MT) dataset. We also perform an extensive analysis of the results. We make a 100-D pre-trained version of our Poincare-variant along with the source code.
翻译:权利主张的表述属于争论采矿的核心。 区分权利主张和非权利主张对于人和机器来说都是艰巨的,因为两者之间潜在的语言差异以及基于定义的广泛正规化不足。 此外,在线社交媒体的使用增多导致网络上未经索取的信息在非正式文本中出现爆炸。 为说明上述情况,我们在本文件中提议DESYR。 这是一个框架,目的是通过利用等级代表学习(受依赖启发的Pointcare嵌入)、基于定义的对齐和特征预测的组合,来消除非正式网络文本中的上述问题。 我们不再使用精细调整的计算机重语言模型,而采用更以域中心为中心但较轻的方法。实验结果表明,DESYR是在四个基准索赔数据集中的最新系统基础上建立的,其中多数是用非正式文本构建的。 在LESA-Twitter数据集中增加了3个权利主张-F1, 增加了1项索赔-F1项中的1项索赔-F1项中1个索赔-F1项中、8-F 宏观索赔中1项和在线数据1项中的数据增加1项的5-F的MLial 数据分析。