The prosperous development of Artificial Intelligence-Generated Content (AIGC) has brought people's anxiety about the spread of false information on social media. Designing detectors for filtering is an effective defense method, but most detectors will be compromised by adversarial samples. Currently, most studies exposing AIGC security issues assume information on model structure and data distribution. In real applications, attackers query and interfere with models that provide services in the form of application programming interfaces (APIs), which constitutes the black-box decision-based attack paradigm. However, to the best of our knowledge, decision-based attacks on AIGC detectors remain unexplored. In this study, we propose \textbf{FBA$^2$D}: a frequency-based black-box attack method for AIGC detection to fill the research gap. Motivated by frequency-domain discrepancies between generated and real images, we develop a decision-based attack that leverages the Discrete Cosine Transform (DCT) for fine-grained spectral partitioning and selects frequency bands as query subspaces, improving both query efficiency and image quality. Moreover, attacks on AIGC detectors should mitigate initialization failures, preserve image quality, and operate under strict query budgets. To address these issues, we adopt an ``adversarial example soup'' method, averaging candidates from successive surrogate iterations and using the result as the initialization to accelerate the query-based attack. The empirical study on the Synthetic LSUN dataset and GenImage dataset demonstrate the effectiveness of our prosed method. This study shows the urgency of addressing practical AIGC security problems.
翻译:人工智能生成内容(AIGC)的蓬勃发展引发了人们对社交媒体上虚假信息传播的担忧。设计用于过滤的检测器是一种有效的防御方法,但大多数检测器会因对抗样本而失效。目前,大多数揭示AIGC安全问题的研究假设攻击者已知模型结构和数据分布信息。在实际应用中,攻击者通过查询以应用程序编程接口(API)形式提供服务的模型并进行干扰,这构成了基于决策的黑盒攻击范式。然而,据我们所知,针对AIGC检测器的基于决策的攻击尚未得到探索。在本研究中,我们提出\\textbf{FBA$^2$D}:一种基于频率的黑盒攻击方法,用于填补AIGC检测领域的研究空白。受生成图像与真实图像在频域差异的启发,我们开发了一种基于决策的攻击方法,利用离散余弦变换(DCT)进行细粒度频谱划分,并选择频带作为查询子空间,从而同时提高查询效率和图像质量。此外,针对AIGC检测器的攻击需要缓解初始化失败问题、保持图像质量,并在严格的查询预算下操作。为解决这些问题,我们采用“对抗样本融合”方法,对连续代理迭代产生的候选样本进行平均,并将结果作为初始化以加速基于查询的攻击。在合成LSUN数据集和GenImage数据集上的实证研究证明了我们所提方法的有效性。本研究揭示了解决实际AIGC安全问题的紧迫性。