面向离线强化学习的数据集蒸馏 (Dataset Distillation for Offline Reinforcement Learning)

from arxiv, ICML 2024 DMLR Workshop Our project site is available at https://datasetdistillation4rl.github.io We also provide our implementation at https://github.com/ggflow123/DDRL

Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at https://datasetdistillation4rl.github.io . We also provide our implementation at https://github.com/ggflow123/DDRL .

翻译：离线强化学习通常需要一个高质量的数据集来训练策略。然而，在许多情况下，获取这样的数据集并不容易，同时基于离线数据训练出在实际环境中表现优异的策略也颇具挑战。我们提出利用数据蒸馏技术来训练并蒸馏出更优的数据集，进而用于训练性能更强的策略模型。实验表明，我们的方法能够合成一个数据集，使得基于该数据集训练的模型在性能上接近基于完整数据集训练的模型，或达到基于百分位数行为克隆方法训练的模型水平。项目网站位于 https://datasetdistillation4rl.github.io，相关实现代码已开源：https://github.com/ggflow123/DDRL。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日