重新审视公平感知的交互式推荐：以物品生命周期作为调控手段 (Revisiting Fairness-aware Interactive Recommendation: Item Lifecycle as a Control Knob)

This paper revisits fairness-aware interactive recommendation (e.g., TikTok, KuaiShou) by introducing a novel control knob, i.e., the lifecycle of items. We make threefold contributions. First, we conduct a comprehensive empirical analysis and uncover that item lifecycles in short-video platforms follow a compressed three-phase pattern, i.e., rapid growth, transient stability, and sharp decay, which significantly deviates from the classical four-stage model (introduction, growth, maturity, decline). Second, we introduce LHRL, a lifecycle-aware hierarchical reinforcement learning framework that dynamically harmonizes fairness and accuracy by leveraging phase-specific exposure dynamics. LHRL consists of two key components: (1) PhaseFormer, a lightweight encoder combining STL decomposition and attention mechanisms for robust phase detection; (2) a two-level HRL agent, where the high-level policy imposes phase-aware fairness constraints, and the low-level policy optimizes immediate user engagement. This decoupled optimization allows for effective reconciliation between long-term equity and short-term utility. Third, experiments on multiple real-world interactive recommendation datasets demonstrate that LHRL significantly improves both fairness and user engagement. Furthermore, the integration of lifecycle-aware rewards into existing RL-based models consistently yields performance gains, highlighting the generalizability and practical value of our approach.

翻译：本文通过引入一种新颖的调控手段——物品生命周期，重新审视了公平感知的交互式推荐（例如TikTok、快手）。我们做出了三方面的贡献。首先，我们进行了全面的实证分析，发现短视频平台中的物品生命周期遵循一种压缩的三阶段模式，即快速增长、短暂稳定和急剧衰减，这与经典的四阶段模型（引入、增长、成熟、衰退）存在显著差异。其次，我们提出了LHRL，一种生命周期感知的分层强化学习框架，通过利用特定阶段的曝光动态，动态协调公平性与准确性。LHRL包含两个关键组件：（1）PhaseFormer，一种结合STL分解和注意力机制的轻量级编码器，用于鲁棒的阶段检测；（2）一个双层HRL智能体，其中高层策略施加阶段感知的公平性约束，低层策略优化即时用户参与度。这种解耦优化实现了长期公平性与短期效用的有效调和。第三，在多个真实世界交互式推荐数据集上的实验表明，LHRL显著提升了公平性和用户参与度。此外，将生命周期感知的奖励集成到现有的基于强化学习的模型中，持续带来了性能提升，凸显了我们方法的普适性和实用价值。