迈向基于多模态大语言模型的可解释伪造图像检测 (Towards Explainable Fake Image Detection with Multi-Modal Large Language Models) - 专知论文

会员服务 ·

0

图像检测 · 多模 · 模态 · 多模态 · 系统 ·

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

翻译：迈向基于多模态大语言模型的可解释伪造图像检测

Yikun Ji,Yan Hong,Jiahui Zhan,Haoxing Chen,jun lan,Huijia Zhu,Weiqiang Wang,Liqing Zhang,Jianfu Zhang

from arxiv, Accepted to ACM MM 2025; 14 pages including Appendix

Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both strong generalization and transparency. Recent progress in Multi-modal Large Language Models (MLLMs) offers new opportunities for reasoning-based AI-generated image detection. In this work, we evaluate the capabilities of MLLMs in comparison to traditional detection methods and human evaluators, highlighting their strengths and limitations. Furthermore, we design six distinct prompts and propose a framework that integrates these prompts to develop a more robust, explainable, and reasoning-driven detection system. The code is available at https://github.com/Gennadiyev/mllm-defake.

翻译：图像生成技术的进步引发了重大的公共安全隐患。我们认为伪造图像检测不应作为'黑箱'运行，理想的检测方法必须同时保证强大的泛化能力和透明度。多模态大语言模型（MLLMs）的最新进展为基于推理的人工智能生成图像检测提供了新的机遇。本研究评估了MLLMs相较于传统检测方法及人工评估者的能力，并系统阐述了其优势与局限。进一步地，我们设计了六种不同的提示模板，并提出整合这些提示的框架，以构建更鲁棒、可解释且基于推理的检测系统。代码已发布于 https://github.com/Gennadiyev/mllm-defake。

0

相关内容

图像检测

RAG与RAU：自然语言处理中的检索增强语言模型综述

RAG与RAU：自然语言处理中的检索增强语言模型综述

专知会员服务

87+阅读 · 2024年5月3日

自生成兵棋AI：基于大型语言模型的双层Agent任务规划

自生成兵棋AI：基于大型语言模型的双层Agent任务规划

专知会员服务

89+阅读 · 2024年4月11日

【CVPR2020-北京大学】FocalMix:用于3D医学图像检测的半监督学习

【CVPR2020-北京大学】FocalMix:用于3D医学图像检测的半监督学习

专知会员服务

56+阅读 · 2020年3月23日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

多视角识别长非编码RNA和人类复杂疾病关联预测研究

国家自然科学基金

4+阅读 · 2017年12月31日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于高空间分辨电子显微学In2-xGaxO3(ZnO)m缺陷分析

国家自然科学基金

0+阅读 · 2015年12月31日

Reproducing and Dissecting Denoising Language Models for Speech Recognition

Arxiv

0+阅读 · 12月15日

Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing

Arxiv

0+阅读 · 12月12日

Optimizing Data Extraction from Materials Science Literature: A Study of Tools Using Large Language Models

Arxiv

0+阅读 · 12月10日

An Explainable Deep Learning Framework for Brain Stroke and Tumor Progression via MRI Interpretation

Arxiv

0+阅读 · 11月16日

How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices

Arxiv

0+阅读 · 11月10日

VIP会员

文章信息

相关主题

相关VIP内容

RAG与RAU：自然语言处理中的检索增强语言模型综述

RAG与RAU：自然语言处理中的检索增强语言模型综述

专知会员服务

87+阅读 · 2024年5月3日

自生成兵棋AI：基于大型语言模型的双层Agent任务规划

自生成兵棋AI：基于大型语言模型的双层Agent任务规划

专知会员服务

89+阅读 · 2024年4月11日

【CVPR2020-北京大学】FocalMix:用于3D医学图像检测的半监督学习

【CVPR2020-北京大学】FocalMix:用于3D医学图像检测的半监督学习

专知会员服务

56+阅读 · 2020年3月23日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

相关论文

Reproducing and Dissecting Denoising Language Models for Speech Recognition

Arxiv

0+阅读 · 12月15日

Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing

Arxiv

0+阅读 · 12月12日

Optimizing Data Extraction from Materials Science Literature: A Study of Tools Using Large Language Models

Arxiv

0+阅读 · 12月10日

An Explainable Deep Learning Framework for Brain Stroke and Tumor Progression via MRI Interpretation

Arxiv

0+阅读 · 11月16日

How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices

Arxiv

0+阅读 · 11月10日

相关基金

多视角识别长非编码RNA和人类复杂疾病关联预测研究

国家自然科学基金

4+阅读 · 2017年12月31日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于高空间分辨电子显微学In2-xGaxO3(ZnO)m缺陷分析

国家自然科学基金

0+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员