We explore methodologies to improve the robustness of generative adversarial imitation learning (GAIL) algorithms to observation noise. Towards this objective, we study the effect of local Lipschitzness of the discriminator and the generator on the robustness of policies learned by GAIL. In many robotics applications, the learned policies by GAIL typically suffer from a degraded performance at test time since the observations from the environment might be corrupted by noise. Hence, robustifying the learned policies against the observation noise is of critical importance. To this end, we propose a regularization method to induce local Lipschitzness in the generator and the discriminator of adversarial imitation learning methods. We show that the modified objective leads to learning significantly more robust policies. Moreover, we demonstrate --- both theoretically and experimentally --- that training a locally Lipschitz discriminator leads to a locally Lipschitz generator, thereby improving the robustness of the resultant policy. We perform extensive experiments on simulated robot locomotion environments from the MuJoCo suite that demonstrate the proposed method learns policies that significantly outperform the state-of-the-art generative adversarial imitation learning algorithm when applied to test scenarios with noise-corrupted observations.
翻译:为了实现这一目标,我们研究了歧视者和产生者在当地的利普西茨对GAIL所学政策的稳健性的影响。在许多机器人应用中,GAIL所学的政策通常在测试时表现退化,因为来自环境的观测可能会被噪音破坏。因此,针对观测噪音而强化所学的政策至关重要。为此,我们提出一种正规化方法,在发电机中诱导当地的利普西茨和辨别性模仿学习方法的歧视者。我们表明,修改后的目标导致学习更强有力的政策。此外,我们从理论上和实验上证明,培训当地利普西茨歧视者在试验时会导致当地利普西茨的发电机,从而改善结果政策的稳健性。我们从MuJoco套房对模拟机器人闭关环境进行了广泛的实验,以显示拟议方法学习的政策大大超越了在试验情景中使用的噪音时的状态的基因模拟模拟模拟模拟模拟模拟模拟模拟模拟算法。