基于双延迟深度确定性策略梯度（TD3）的双旋翼控制系统研究 (Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3))

from arxiv, This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

This paper proposes a reinforcement learning (RL) framework for controlling and stabilizing the Twin Rotor Aerodynamic System (TRAS) at specific pitch and azimuth angles and tracking a given trajectory. The complex dynamics and non-linear characteristics of the TRAS make it challenging to control using traditional control algorithms. However, recent developments in RL have attracted interest due to their potential applications in the control of multirotors. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was used in this paper to train the RL agent. This algorithm is used for environments with continuous state and action spaces, similar to the TRAS, as it does not require a model of the system. The simulation results illustrated the effectiveness of the RL control method. Next, external disturbances in the form of wind disturbances were used to test the controller's effectiveness compared to conventional PID controllers. Lastly, experiments on a laboratory setup were carried out to confirm the controller's effectiveness in real-world applications.

翻译：本文提出了一种强化学习（RL）框架，用于控制和稳定双旋翼气动系统（TRAS）在特定俯仰角和方位角下的姿态，并跟踪给定轨迹。TRAS的复杂动力学特性和非线性特征使其难以通过传统控制算法实现有效控制。然而，强化学习领域的最新进展因其在多旋翼飞行器控制中的潜在应用而备受关注。本文采用双延迟深度确定性策略梯度（TD3）算法训练强化学习智能体。该算法适用于具有连续状态和动作空间的环境（如TRAS），且无需系统模型。仿真结果验证了强化学习控制方法的有效性。随后，通过引入风扰形式的外部干扰，对比测试了该控制器与传统PID控制器的性能。最后，在实验室平台上进行了实验，以验证控制器在实际应用中的有效性。