In large-scale industrial recommendation systems, retrieval must produce high-quality candidates from massive corpora under strict latency. Recently, Generative Retrieval (GR) has emerged as a viable alternative to Embedding-Based Retrieval (EBR), which quantizes items into a finite token space and decodes candidates autoregressively, providing a scalable path that explicitly models target-history interactions via cross-attention. However, three challenges persist: 1) how to balance users' long-term and short-term interests , 2) noise interference when generating hierarchical semantic IDs (SIDs), 3) the absence of explicit modeling for negative feedback such as exposed items without clicks. To address these challenges, we propose DualGR, a generative retrieval framework that explicitly models dual horizons of user interests with selective activation. Specifically, DualGR utilizes Dual-Branch Long/Short-Term Router (DBR) to cover both stable preferences and transient intents by explicitly modeling users' long- and short-term behaviors. Meanwhile, Search-based SID Decoding (S2D) is presented to control context-induced noise and enhance computational efficiency by constraining candidate interactions to the current coarse (level-1) bucket during fine-grained (level-2/3) SID prediction. % also reinforcing intra-class consistency. Finally, we propose an Exposure-aware Next-Token Prediction Loss (ENTP-Loss) that treats "exposed-but-unclicked" items as hard negatives at level-1, enabling timely interest fade-out. On the large-scale Kuaishou short-video recommendation system, DualGR has achieved outstanding performance. Online A/B testing shows +0.527% video views and +0.432% watch time lifts, validating DualGR as a practical and effective paradigm for industrial generative retrieval.
翻译:在大规模工业推荐系统中,检索必须在严格延迟约束下从海量语料库中生成高质量候选。近年来,生成式检索(Generative Retrieval, GR)已成为基于嵌入的检索(Embedding-Based Retrieval, EBR)的一种可行替代方案,其将物品量化到有限令牌空间并通过自回归方式解码候选,提供了一条可扩展的路径,通过交叉注意力显式建模目标-历史交互。然而,三个挑战依然存在:1)如何平衡用户的长期与短期兴趣,2)生成分层语义ID(Semantic IDs, SIDs)时的噪声干扰,3)对负面反馈(如曝光未点击物品)缺乏显式建模。为应对这些挑战,我们提出DualGR,一种通过选择性激活显式建模用户兴趣双重视野的生成式检索框架。具体而言,DualGR利用双分支长/短期路由器(Dual-Branch Long/Short-Term Router, DBR),通过显式建模用户的长期与短期行为,覆盖稳定偏好与瞬时意图。同时,我们提出基于搜索的SID解码(Search-based SID Decoding, S2D),通过将细粒度(level-2/3)SID预测期间的候选交互约束在当前粗粒度(level-1)桶内,以控制上下文诱导噪声并提升计算效率。最后,我们提出曝光感知的下一个令牌预测损失(Exposure-aware Next-Token Prediction Loss, ENTP-Loss),将“曝光未点击”物品视为level-1的硬负样本,实现兴趣的及时衰减。在大规模快手短视频推荐系统中,DualGR取得了优异性能。在线A/B测试显示视频播放量提升+0.527%,观看时长提升+0.432%,验证了DualGR作为工业生成式检索的实用且有效的范式。