In large-scale advertising recommendation systems, retrieval serves as a critical component, aiming to efficiently select a subset of candidate ads relevant to user behaviors from a massive ad inventory for subsequent ranking and recommendation. The Embedding-Based Retrieval (EBR) methods modeled by the dual-tower network are widely used in the industry to maintain both retrieval efficiency and accuracy. However, the dual-tower model has significant limitations: the embeddings of users and ads interact only at the final inner product computation, resulting in insufficient feature interaction capabilities. Although DNN-based models with both user and ad as input features, allowing for early-stage interaction between these features, are introduced in the ranking stage to mitigate this issue, they are computationally infeasible for the retrieval stage. To bridge this gap, this paper proposes an efficient GPU-based feature interaction for the dual-tower network to significantly improve retrieval accuracy while substantially reducing computational costs. Specifically, we introduce a novel compressed inverted list designed for GPU acceleration, enabling efficient feature interaction computation at scale. To the best of our knowledge, this is the first framework in the industry to successfully implement Wide and Deep in a retrieval system. We apply this model to the real-world business scenarios in Tencent Advertising, and experimental results demonstrate that our method outperforms existing approaches in offline evaluation and has been successfully deployed to Tencent's advertising recommendation system, delivering significant online performance gains. This improvement not only validates the effectiveness of the proposed method, but also provides new practical guidance for optimizing large-scale ad retrieval systems.
翻译:在大规模广告推荐系统中,检索作为关键组件,旨在从海量广告库中高效筛选出与用户行为相关的候选广告子集,以供后续排序与推荐。基于双塔网络建模的嵌入式检索方法在工业界广泛应用,以兼顾检索效率与准确性。然而,双塔模型存在显著局限性:用户与广告的嵌入仅在最终内积计算时进行交互,导致特征交互能力不足。尽管在排序阶段引入了以用户和广告为输入特征的深度神经网络模型,允许特征在早期阶段进行交互以缓解此问题,但这些模型在检索阶段的计算成本过高而不可行。为弥补这一差距,本文提出一种基于GPU的高效特征交互双塔网络,在显著降低计算成本的同时大幅提升检索精度。具体而言,我们设计了一种专为GPU加速优化的新型压缩倒排列表,支持大规模高效的特征交互计算。据我们所知,这是工业界首个在检索系统中成功实现Wide and Deep模型的框架。我们将该模型应用于腾讯广告的真实业务场景,实验结果表明,我们的方法在离线评估中优于现有方案,并已成功部署于腾讯广告推荐系统,带来显著的线上性能提升。这一改进不仅验证了所提方法的有效性,也为优化大规模广告检索系统提供了新的实践指导。