Analysing learning behaviour in Multi-Agent Reinforcement Learning (MARL) environments is challenging, in particular with respect to \textit{individual} decision-making. Practitioners frequently tend to study or compare MARL algorithms from a qualitative perspective largely due to the inherent stochasticity in practical algorithms arising from random dithering exploration strategies, environment transition noise, and stochastic gradient updates to name a few. Traditional analytical approaches, such as replicator dynamics, often rely on mean-field approximations to remove stochastic effects, but this simplification, whilst able to provide general overall trends, might lead to dissonance between analytical predictions and actual realisations of individual trajectories. In this paper, we propose a novel perspective on MARL systems by modelling them as \textit{coupled stochastic dynamical systems}, capturing both agent interactions and environmental characteristics. Leveraging tools from dynamical systems theory, we analyse the stability and sensitivity of agent behaviour at individual level, which are key dimensions for their practical deployments, for example, in presence of strict safety requirements. This framework allows us, for the first time, to rigorously study MARL dynamics taking into consideration their inherent stochasticity, providing a deeper understanding of system behaviour and practical insights for the design and control of multi-agent learning processes.
翻译:分析多智能体强化学习(MARL)环境中的学习行为具有挑战性,特别是在涉及个体决策方面。由于实际算法中固有的随机性(如随机抖动探索策略、环境转移噪声和随机梯度更新等),从业者通常倾向于从定性角度研究或比较MARL算法。传统的分析方法(如复制动力学)常依赖平均场近似来消除随机效应,但这种简化虽能提供总体趋势,可能导致分析预测与个体轨迹的实际实现之间存在偏差。本文提出一种新视角,将MARL系统建模为耦合随机动力学系统,以同时捕捉智能体交互与环境特性。借助动力学系统理论工具,我们在个体层面分析智能体行为的稳定性和敏感性——这些维度对其实际部署(例如在严格安全要求下)至关重要。该框架首次允许我们在考虑MARL固有随机性的前提下严格研究其动力学特性,从而深化对系统行为的理解,并为多智能体学习过程的设计与控制提供实践洞见。