Reinforcement learning has recently shown promise in improving retrieval-augmented generation (RAG). Despite these advances, its effectiveness in multi-hop question answering (QA) remains limited by two fundamental limitations: (i) global planning absence to structure multi-step reasoning, and (ii) unfaithful execution, which hinders effective query formulation and consistent use of retrieved evidence. We propose GlobalRAG, a reinforcement learning framework designed to enhance global reasoning in multi-hop QA. GlobalRAG decomposes questions into subgoals, coordinates retrieval with reasoning, and refines evidence iteratively. To guide this process, we introduce Planning Quality Reward and SubGoal Completion Reward, which encourage coherent planning and reliable subgoal execution. In addition, a progressive weight annealing strategy balances process-oriented and outcome-based objectives. Extensive experiments on both in-domain and out-of-domain benchmarks demonstrate that GlobalRAG significantly outperforms strong baselines while using only 8k training data (42% of the training data used by strong baselines), achieving average improvements of 14.2% in both EM and F1.
翻译:强化学习近期在提升检索增强生成(RAG)方面展现出潜力。然而,其在多跳问答(QA)中的有效性仍受限于两个根本性不足:(i)缺乏构建多步推理的全局规划;(ii)执行不忠实,阻碍了有效查询构建与检索证据的一致性使用。我们提出GlobalRAG,一个旨在增强多跳问答中全局推理的强化学习框架。GlobalRAG将问题分解为子目标,协调检索与推理过程,并迭代优化证据。为引导该过程,我们引入了规划质量奖励与子目标完成奖励,以鼓励连贯的规划与可靠的子目标执行。此外,渐进式权重退火策略平衡了过程导向与结果导向的目标。在领域内与领域外基准上的大量实验表明,GlobalRAG仅使用8k训练数据(相当于强基线模型训练数据的42%)即显著优于现有强基线,在EM和F1指标上平均提升14.2%。