Rapid advances in generative AI have led to increasingly realistic deepfakes, posing growing challenges for law enforcement and public trust. Existing passive deepfake detectors struggle to keep pace, largely due to their dependence on specific forgery artifacts, which limits their ability to generalize to new deepfake types. Proactive deepfake detection using watermarks has emerged to address the challenge of identifying high-quality synthetic media. However, these methods often struggle to balance robustness against benign distortions with sensitivity to malicious tampering. This paper introduces a novel deep learning framework that harnesses high-dimensional latent space representations and the Multi-Agent Adversarial Reinforcement Learning (MAARL) paradigm to develop a robust and adaptive watermarking approach. Specifically, we develop a learnable watermark embedder that operates in the latent space, capturing high-level image semantics, while offering precise control over message encoding and extraction. The MAARL paradigm empowers the learnable watermarking agent to pursue an optimal balance between robustness and fragility by interacting with a dynamic curriculum of benign and malicious image manipulations simulated by an adversarial attacker agent. Comprehensive evaluations on the CelebA and CelebA-HQ benchmarks reveal that our method consistently outperforms state-of-the-art approaches, achieving improvements of over 4.5% on CelebA and more than 5.3% on CelebA-HQ under challenging manipulation scenarios.
翻译:生成式人工智能的快速发展导致深度伪造内容日益逼真,给执法和公众信任带来日益严峻的挑战。现有的被动式深度伪造检测器难以跟上发展步伐,这主要归因于其对特定伪造伪影的依赖,从而限制了其泛化至新型深度伪造类型的能力。采用水印技术的主动式深度伪造检测方法应运而生,以应对识别高质量合成媒体的挑战。然而,这些方法往往难以在抵抗良性失真的鲁棒性与对恶意篡改的敏感性之间取得平衡。本文提出了一种新颖的深度学习框架,该框架利用高维潜在空间表示和多智能体对抗强化学习范式,开发出一种鲁棒且自适应的水印方法。具体而言,我们开发了一种在潜在空间中运行的可学习水印嵌入器,它能够捕获图像的高级语义,同时提供对消息编码和提取的精确控制。MAARL范式通过让可学习水印智能体与由对抗攻击者智能体模拟的良性及恶意图像操作动态课程进行交互,使其能够追求鲁棒性与脆弱性之间的最优平衡。在CelebA和CelebA-HQ基准测试上的综合评估表明,我们的方法在具有挑战性的操作场景下,始终优于最先进的方法,在CelebA上实现了超过4.5%的性能提升,在CelebA-HQ上实现了超过5.3%的性能提升。