Clustering of Bandits (CB) methods enhance sequential decision-making by grouping bandits into clusters based on similarity and incorporating cluster-level contextual information, demonstrating effectiveness and adaptability in applications like personalized streaming recommendations. However, when extending CB algorithms to their neural version (commonly referred to as Clustering of Neural Bandits, or CNB), they suffer from loss of plasticity, where neural network parameters become rigid and less adaptable over time, limiting their ability to adapt to non-stationary environments (e.g., dynamic user preferences in recommendation). To address this challenge, we propose Selective Reinitialization (SeRe), a novel bandit learning framework that dynamically preserves the adaptability of CNB algorithms in evolving environments. SeRe leverages a contribution utility metric to identify and selectively reset underutilized units, mitigating loss of plasticity while maintaining stable knowledge retention. Furthermore, when combining SeRe with CNB algorithms, the adaptive change detection mechanism adjusts the reinitialization frequency according to the degree of non-stationarity, ensuring effective adaptation without unnecessary resets. Theoretically, we prove that SeRe enables sublinear cumulative regret in piecewise-stationary environments, outperforming traditional CNB approaches in long-term performances. Extensive experiments on six real-world recommendation datasets demonstrate that SeRe-enhanced CNB algorithms can effectively mitigate the loss of plasticity with lower regrets, improving adaptability and robustness in dynamic settings.
翻译:多臂赌博机聚类方法通过基于相似性将赌博机分组为簇并整合簇级上下文信息,增强了序列决策能力,在个性化流媒体推荐等应用中展现出有效性和适应性。然而,当将CB算法扩展至其神经版本(通常称为神经多臂赌博机聚类,或CNB)时,它们会遭受可塑性丧失的问题,即神经网络参数随时间变得僵化且适应性降低,限制了其在非平稳环境(如推荐系统中的动态用户偏好)中的适应能力。为应对这一挑战,我们提出选择性重初始化,这是一种新颖的赌博机学习框架,能够动态保持CNB算法在演化环境中的适应性。SeRe利用贡献效用度量来识别并选择性重置未充分利用的单元,在缓解可塑性丧失的同时维持稳定的知识保留。此外,当将SeRe与CNB算法结合时,自适应变化检测机制会根据非平稳程度调整重初始化频率,确保有效适应而无需不必要的重置。理论上,我们证明SeRe能在分段平稳环境中实现次线性累积遗憾,在长期性能上优于传统CNB方法。在六个真实世界推荐数据集上的大量实验表明,SeRe增强的CNB算法能以更低的遗憾有效缓解可塑性丧失,提升动态环境下的适应性和鲁棒性。