基于检索增强生成的关系抽取 (Retrieval-Augmented Generation-based Relation Extraction)

Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by employing entity and relation extraction (RE) methodologies. The identification of the relation between a pair of entities plays a crucial role within this framework. Despite the existence of various techniques for relation extraction, their efficacy heavily relies on access to labeled data and substantial computational resources. In addressing these challenges, Large Language Models (LLMs) emerge as promising solutions; however, they might return hallucinating responses due to their own training data. To overcome these limitations, Retrieved-Augmented Generation-based Relation Extraction (RAG4RE) in this work is proposed, offering a pathway to enhance the performance of relation extraction tasks. This work evaluated the effectiveness of our RAG4RE approach utilizing different LLMs. Through the utilization of established benchmarks, such as TACRED, TACREV, Re-TACRED, and SemEval RE datasets, our aim is to comprehensively evaluate the efficacy of our RAG4RE approach. In particularly, we leverage prominent LLMs including Flan T5, Llama2, and Mistral in our investigation. The results of our study demonstrate that our RAG4RE approach surpasses performance of traditional RE approaches based solely on LLMs, particularly evident in the TACRED dataset and its variations. Furthermore, our approach exhibits remarkable performance compared to previous RE methodologies across both TACRED and TACREV datasets, underscoring its efficacy and potential for advancing RE tasks in natural language processing.

翻译：信息抽取（IE）是一种通过采用实体与关系抽取（RE）方法将非结构化文本数据转换为结构化格式的变革性过程。在该框架中，识别一对实体间的关系起着至关重要的作用。尽管存在多种关系抽取技术，但其效果在很大程度上依赖于对标注数据的获取和大量计算资源。为应对这些挑战，大语言模型（LLMs）展现出作为有前景解决方案的潜力；然而，由于自身训练数据的影响，它们可能产生幻觉性响应。为克服这些限制，本文提出了基于检索增强生成的关系抽取（RAG4RE），为提升关系抽取任务的性能提供了途径。本研究利用不同的大语言模型评估了所提RAG4RE方法的有效性。通过采用TACRED、TACREV、Re-TACRED和SemEval RE等已建立的基准数据集，我们的目标是全面评估RAG4RE方法的效能。具体而言，我们在研究中利用了包括Flan T5、Llama2和Mistral在内的知名大语言模型。研究结果表明，我们的RAG4RE方法超越了仅基于大语言模型的传统关系抽取方法，这一优势在TACRED数据集及其变体上尤为明显。此外，与先前的关系抽取方法相比，我们的方法在TACRED和TACREV数据集上均表现出卓越的性能，突显了其在推进自然语言处理中关系抽取任务方面的有效性和潜力。