Ranking models have become an important part of modern personalized recommendation systems. However, significant challenges persist in handling high-cardinality, heterogeneous, and sparse feature spaces, particularly regarding model scalability and efficiency. We identify two key bottlenecks: (i) Representation Bottleneck: Driven by the high cardinality and dynamic nature of features, model capacity is forced into sparse-activated embedding layers, leading to low-rank representations. This, in turn, triggers phenomena like "One-Epoch" and "Interaction-Collapse," ultimately hindering model scalability.(ii) Computational Bottleneck: Integrating all heterogeneous features into a unified model triggers an explosion in the number of feature tokens, rendering traditional attention mechanisms computationally demanding and susceptible to attention dispersion. To dismantle these barriers, we introduce STORE, a unified and scalable token-based ranking framework built upon three core innovations: (1) Semantic Tokenization fundamentally tackles feature heterogeneity and sparsity by decomposing high-cardinality sparse features into a compact set of stable semantic tokens; and (2) Orthogonal Rotation Transformation is employed to rotate the subspace spanned by low-cardinality static features, which facilitates more efficient and effective feature interactions; and (3) Efficient attention that filters low-contributing tokens to improve computional efficiency while preserving model accuracy. Across extensive offline experiments and online A/B tests, our framework consistently improves prediction accuracy(online CTR by 2.71%, AUC by 1.195%) and training effeciency (1.84 throughput).
翻译:排序模型已成为现代个性化推荐系统的重要组成部分。然而,在处理高基数、异构且稀疏的特征空间时,尤其是在模型可扩展性与效率方面,仍存在显著挑战。我们识别出两个关键瓶颈:(i)表示瓶颈:受高基数特征与特征动态性的驱动,模型容量被迫集中于稀疏激活的嵌入层,导致低秩表示。这进而引发“单周期”与“交互塌缩”等现象,最终阻碍模型的可扩展性。(ii)计算瓶颈:将所有异构特征集成至统一模型中会引发特征令牌数量的爆炸式增长,使得传统注意力机制计算开销巨大且易受注意力分散影响。为消除这些障碍,我们提出了STORE,一个基于令牌的统一可扩展排序框架,其建立在三项核心创新之上:(1)语义分词通过将高基数稀疏特征分解为一组紧凑且稳定的语义令牌,从根本上解决特征异构性与稀疏性问题;(2)正交旋转变换用于旋转由低基数静态特征张成的子空间,以促进更高效且有效的特征交互;(3)高效注意力机制通过过滤低贡献令牌来提升计算效率,同时保持模型精度。在广泛的离线实验与在线A/B测试中,我们的框架持续提升了预测准确性(在线CTR提升2.71%,AUC提升1.195%)与训练效率(吞吐量提升1.84倍)。