Generative Retrieval (GR) differs from the traditional index-then-retrieve pipeline by storing relevance in model parameters and directly generating document identifiers. However, GR often struggles to generalize and is costly to scale. We introduce QUESTER (QUEry SpecificaTion gEnerative Retrieval), which reframes GR as query specification generation - in this work, a simple keyword query handled by BM25 - using a (small) LLM. The policy is trained using reinforcement learning techniques (GRPO). Across in- and out-of-domain evaluations, we show that our model is more effective than BM25, and competitive with neural IR models, while maintaining a good efficiency
翻译:生成式检索(Generative Retrieval,GR)不同于传统的索引-检索流程,它将相关性信息存储在模型参数中,并直接生成文档标识符。然而,GR 通常难以泛化,且扩展成本高昂。我们提出了 QUESTER(QUEry SpecificaTion gEnerative Retrieval),它将 GR 重新定义为查询规范生成——在本研究中,即通过一个(小型)大语言模型生成可由 BM25 处理的简单关键词查询。该策略使用强化学习技术(GRPO)进行训练。在领域内和领域外评估中,我们的模型比 BM25 更有效,并与神经信息检索模型具有竞争力,同时保持了良好的效率。