UniGen-1.5：通过强化学习中的奖励统一增强图像生成与编辑能力 (UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning) - 专知论文

会员服务 ·

0

图像生成 · 图像理解 · 强化学习 · 多模 · 模态 ·

UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

翻译：UniGen-1.5：通过强化学习中的奖励统一增强图像生成与编辑能力

Rui Tian,Mingfei Gao,Haiming Gang,Jiasen Lu,Zhe Gan,Yinfei Yang,Zuxuan Wu,Afshin Dehghan

We present UniGen-1.5, a unified multimodal large language model (MLLM) for advanced image understanding, generation and editing. Building upon UniGen, we comprehensively enhance the model architecture and training pipeline to strengthen the image understanding and generation capabilities while unlocking strong image editing ability. Especially, we propose a unified Reinforcement Learning (RL) strategy that improves both image generation and image editing jointly via shared reward models. To further enhance image editing performance, we propose a light Edit Instruction Alignment stage that significantly improves the editing instruction comprehension that is essential for the success of the RL training. Experimental results show that UniGen-1.5 demonstrates competitive understanding and generation performance. Specifically, UniGen-1.5 achieves 0.89 and 4.31 overall scores on GenEval and ImgEdit that surpass the state-of-the-art models such as BAGEL and reaching performance comparable to proprietary models such as GPT-Image-1.

翻译：本文提出UniGen-1.5，一个用于高级图像理解、生成与编辑的统一多模态大语言模型（MLLM）。在UniGen基础上，我们全面增强了模型架构与训练流程，以强化图像理解与生成能力，同时解锁了强大的图像编辑功能。特别地，我们提出了一种统一的强化学习（RL）策略，通过共享奖励模型联合提升图像生成与编辑性能。为进一步增强图像编辑表现，我们提出了轻量级的编辑指令对齐阶段，显著提升了对RL训练成功至关重要的编辑指令理解能力。实验结果表明，UniGen-1.5展现出具有竞争力的理解与生成性能。具体而言，UniGen-1.5在GenEval和ImgEdit基准上分别获得0.89和4.31的综合得分，超越了BAGEL等最先进模型，达到与GPT-Image-1等专有模型相当的性能水平。

0

相关内容

图像生成

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

专知会员服务

12+阅读 · 2019年11月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

MIT高赞深度学习教程：一文看懂CNN、RNN等7种范例（TensorFlow教程）

MIT高赞深度学习教程：一文看懂CNN、RNN等7种范例（TensorFlow教程）

全球人工智能

10+阅读 · 2019年5月5日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

下载 | 384页NLP多任务联合学习教程（PPT）

下载 | 384页NLP多任务联合学习教程（PPT）

机器学习算法与Python学习

20+阅读 · 2018年11月22日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

专知

15+阅读 · 2018年1月16日

基于复杂图知识表示的终身强化学习研究

国家自然科学基金

37+阅读 · 2015年12月31日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

PPSEBM: An Energy-Based Model with Progressive Parameter Selection for Continual Learning

Arxiv

0+阅读 · 12月17日

Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning

Arxiv

0+阅读 · 12月8日

EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models

Arxiv

0+阅读 · 12月4日

DyFuLM: An Advanced Multimodal Framework for Sentiment Analysis

Arxiv

0+阅读 · 12月1日

Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach

Arxiv

0+阅读 · 11月15日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

专知会员服务

12+阅读 · 2019年11月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

前沿人工智能趋势报告（Frontier AI Trends Report）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

MIT高赞深度学习教程：一文看懂CNN、RNN等7种范例（TensorFlow教程）

MIT高赞深度学习教程：一文看懂CNN、RNN等7种范例（TensorFlow教程）

全球人工智能

10+阅读 · 2019年5月5日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

下载 | 384页NLP多任务联合学习教程（PPT）

下载 | 384页NLP多任务联合学习教程（PPT）

机器学习算法与Python学习

20+阅读 · 2018年11月22日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

Seq2seq强化学习实战 (Pytorch, Tensorflow, Theano)

专知

15+阅读 · 2018年1月16日

相关论文

PPSEBM: An Energy-Based Model with Progressive Parameter Selection for Continual Learning

Arxiv

0+阅读 · 12月17日

Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning

Arxiv

0+阅读 · 12月8日

EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models

Arxiv

0+阅读 · 12月4日

DyFuLM: An Advanced Multimodal Framework for Sentiment Analysis

Arxiv

0+阅读 · 12月1日

Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach

Arxiv

0+阅读 · 11月15日

相关基金

基于复杂图知识表示的终身强化学习研究

国家自然科学基金

37+阅读 · 2015年12月31日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员