Most of the existing generative adversarial networks (GAN) for text generation suffer from the instability of reinforcement learning training algorithms such as policy gradient, leading to unstable performance. To tackle this problem, we propose a novel framework called Adversarial Reward Augmented Maximum Likelihood (ARAML). During adversarial training, the discriminator assigns rewards to samples which are acquired from a stationary distribution near the data rather than the generator's distribution. The generator is optimized with maximum likelihood estimation augmented by the discriminator's rewards instead of policy gradient. Experiments show that our model can outperform state-of-the-art text GANs with a more stable training process.
翻译:现有大多数用于文本生成的基因对抗网络(GAN)都因强化学习算法(如政策梯度)的不稳定而受到影响,这些算法导致工作表现不稳定。为了解决这一问题,我们提议了一个名为Aversarial Reward Exwarded Adversal United Liliclious (ARAML)的新框架。在对抗性培训期间,歧视者将奖赏分配给从数据附近固定分布而不是发电机分布中获得的样本。生成者得到最优化,最大的可能性估计由歧视者的奖励而不是政策梯度增加。实验显示,我们的模型可以比最先进的GANs版本更稳定的培训过程要好。