RAVEN++：基于主动强化推理精确定位广告视频中的细粒度违规内容 (RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning)

Advertising (Ad) is a cornerstone of the digital economy, yet the moderation of video advertisements remains a significant challenge due to their complexity and the need for precise violation localization. While recent advancements, such as the RAVEN model, have improved coarse-grained violation detection, critical gaps persist in fine-grained understanding, explainability, and generalization. To address these limitations, we propose RAVEN++, a novel framework that introduces three key innovations: 1) Active Reinforcement Learning (RL), which dynamically adapts training to samples of varying difficulty; 2) Fine-Grained Violation Understanding, achieved through hierarchical reward functions and reasoning distillation; and 3) Progressive Multi-Stage Training, which systematically combines knowledge injection, curriculum-based passive RL, and active RL. Extensive experiments on both public and proprietary datasets, on both offline scenarios and online deployed A/B Testing, demonstrate that RAVEN++ outperforms general-purpose LLMs and specialized models like RAVEN in terms of fine-grained violation understanding, reasoning capabilities, and generalization ability.

翻译：广告是数字经济的基石，然而视频广告的审核因其复杂性和对违规内容精确定位的需求，仍是一项重大挑战。尽管近期如RAVEN模型等进展已提升了粗粒度违规检测能力，但在细粒度理解、可解释性和泛化性方面仍存在关键不足。为应对这些局限，我们提出了RAVEN++，一个引入三项关键创新的新型框架：1）主动强化学习，动态调整训练以适应不同难度样本；2）通过分层奖励函数与推理蒸馏实现的细粒度违规理解；3）渐进式多阶段训练，系统整合知识注入、基于课程学习的被动强化学习及主动强化学习。在公开与专有数据集上，针对离线场景及在线部署的A/B测试进行的广泛实验表明，RAVEN++在细粒度违规理解、推理能力和泛化性能方面均优于通用大型语言模型及RAVEN等专用模型。