This paper revisits fairness-aware interactive recommendation (e.g., TikTok, KuaiShou) by introducing a novel control knob, i.e., the lifecycle of items. We make threefold contributions. First, we conduct a comprehensive empirical analysis and uncover that item lifecycles in short-video platforms follow a compressed three-phase pattern, i.e., rapid growth, transient stability, and sharp decay, which significantly deviates from the classical four-stage model (introduction, growth, maturity, decline). Second, we introduce LHRL, a lifecycle-aware hierarchical reinforcement learning framework that dynamically harmonizes fairness and accuracy by leveraging phase-specific exposure dynamics. LHRL consists of two key components: (1) PhaseFormer, a lightweight encoder combining STL decomposition and attention mechanisms for robust phase detection; (2) a two-level HRL agent, where the high-level policy imposes phase-aware fairness constraints, and the low-level policy optimizes immediate user engagement. This decoupled optimization allows for effective reconciliation between long-term equity and short-term utility. Third, experiments on multiple real-world interactive recommendation datasets demonstrate that LHRL significantly improves both fairness and user engagement. Furthermore, the integration of lifecycle-aware rewards into existing RL-based models consistently yields performance gains, highlighting the generalizability and practical value of our approach.
翻译:本文通过引入一种新颖的调控手段——物品生命周期,重新审视了公平感知的交互式推荐(例如TikTok、快手)。我们做出了三方面的贡献。首先,我们进行了全面的实证分析,发现短视频平台中的物品生命周期遵循一种压缩的三阶段模式,即快速增长、短暂稳定和急剧衰减,这与经典的四阶段模型(引入、增长、成熟、衰退)存在显著差异。其次,我们提出了LHRL,一种生命周期感知的分层强化学习框架,通过利用特定阶段的曝光动态,动态协调公平性与准确性。LHRL包含两个关键组件:(1)PhaseFormer,一种结合STL分解和注意力机制的轻量级编码器,用于鲁棒的阶段检测;(2)一个双层HRL智能体,其中高层策略施加阶段感知的公平性约束,低层策略优化即时用户参与度。这种解耦优化实现了长期公平性与短期效用的有效调和。第三,在多个真实世界交互式推荐数据集上的实验表明,LHRL显著提升了公平性和用户参与度。此外,将生命周期感知的奖励集成到现有的基于强化学习的模型中,持续带来了性能提升,凸显了我们方法的普适性和实用价值。