Biomedical knowledge graphs (KGs) are vital for drug discovery and clinical decision support but remain incomplete. Large language models (LLMs) excel at extracting biomedical relations, yet their outputs lack standardization and alignment with ontologies, limiting KG integration. We introduce RELATE, a three-stage pipeline that maps LLM-extracted relations to standardized ontology predicates using ChemProt and the Biolink Model. The pipeline includes: (1) ontology preprocessing with predicate embeddings, (2) similarity-based retrieval enhanced with SapBERT, and (3) LLM-based reranking with explicit negation handling. This approach transforms relation extraction from free-text outputs to structured, ontology-constrained representations. On the ChemProt benchmark, RELATE achieves 52% exact match and 94% accuracy@10, and in 2,400 HEAL Project abstracts, it effectively rejects irrelevant associations (0.4%) and identifies negated assertions. RELATE captures nuanced biomedical relationships while ensuring quality for KG augmentation. By combining vector search with contextual LLM reasoning, RELATE provides a scalable, semantically accurate framework for converting unstructured biomedical literature into standardized KGs.
翻译:生物医学知识图谱在药物发现与临床决策支持中至关重要,但其完整性仍有待提升。大语言模型在提取生物医学关系方面表现出色,但其输出缺乏标准化且未与本体对齐,限制了知识图谱的整合。本文提出RELATE,一种三阶段处理流程,利用ChemProt和Biolink模型将LLM提取的关系映射至标准化的本体谓词。该流程包括:(1)基于谓词嵌入的本体预处理,(2)采用SapBERT增强的相似性检索,以及(3)具备显式否定处理能力的LLM重排序。该方法将关系抽取从自由文本输出转化为结构化、受本体约束的表示形式。在ChemProt基准测试中,RELATE实现了52%的精确匹配率和94%的准确率@10;在2400篇HEAL项目摘要中,它能有效排除无关关联(0.4%)并识别否定性断言。RELATE在捕捉细微生物医学关系的同时,确保了知识图谱增强的质量。通过结合向量搜索与上下文感知的LLM推理,RELATE为将非结构化生物医学文献转化为标准化知识图谱提供了一个可扩展、语义精确的框架。