Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both strong generalization and transparency. Recent progress in Multi-modal Large Language Models (MLLMs) offers new opportunities for reasoning-based AI-generated image detection. In this work, we evaluate the capabilities of MLLMs in comparison to traditional detection methods and human evaluators, highlighting their strengths and limitations. Furthermore, we design six distinct prompts and propose a framework that integrates these prompts to develop a more robust, explainable, and reasoning-driven detection system. The code is available at https://github.com/Gennadiyev/mllm-defake.
翻译:图像生成技术的进步引发了重大的公共安全隐患。我们认为伪造图像检测不应作为'黑箱'运行,理想的检测方法必须同时保证强大的泛化能力和透明度。多模态大语言模型(MLLMs)的最新进展为基于推理的人工智能生成图像检测提供了新的机遇。本研究评估了MLLMs相较于传统检测方法及人工评估者的能力,并系统阐述了其优势与局限。进一步地,我们设计了六种不同的提示模板,并提出整合这些提示的框架,以构建更鲁棒、可解释且基于推理的检测系统。代码已发布于 https://github.com/Gennadiyev/mllm-defake。