In real practice, questions are typically complex and knowledge-intensive, requiring Large Language Models (LLMs) to recognize the multifaceted nature of the question and reason across multiple information sources. Iterative and adaptive retrieval, where LLMs decide when and what to retrieve based on their reasoning, has been shown to be a promising approach to resolve complex, knowledge-intensive questions. However, the performance of such retrieval frameworks is limited by the accumulation of reasoning errors and misaligned retrieval results. To overcome these limitations, we propose TreeRare (Syntax Tree-Guided Retrieval and Reasoning), a framework that utilizes syntax trees to guide information retrieval and reasoning for question answering. Following the principle of compositionality, TreeRare traverses the syntax tree in a bottom-up fashion, and in each node, it generates subcomponent-based queries and retrieves relevant passages to resolve localized uncertainty. A subcomponent question answering module then synthesizes these passages into concise, context-aware evidence. Finally, TreeRare aggregates the evidence across the tree to form a final answer. Experiments across five question answering datasets involving ambiguous or multi-hop reasoning demonstrate that TreeRare achieves substantial improvements over existing state-of-the-art methods.
翻译:在实际应用中,问题通常具有复杂性和知识密集性,要求大型语言模型(LLMs)能够识别问题的多层面特性,并在多个信息源之间进行推理。迭代式自适应检索——即LLMs根据其推理过程决定何时检索以及检索什么内容——已被证明是解决复杂知识密集型问题的有效途径。然而,此类检索框架的性能受限于推理错误的累积以及检索结果的不对齐问题。为克服这些局限,我们提出了TreeRare(基于语法树的检索与推理框架),该框架利用语法树指导问答任务中的信息检索与推理过程。遵循组合性原则,TreeRare采用自底向上的方式遍历语法树,在每个节点生成基于子组件的查询并检索相关文本片段以消除局部不确定性。随后,子组件问答模块将这些文本片段合成为简洁且具有上下文感知的证据。最终,TreeRare聚合整棵树的证据形成最终答案。在涉及模糊推理或多跳推理的五个问答数据集上的实验表明,TreeRare相较于现有最先进方法取得了显著提升。