Multi-agent multi-objective systems (MAMOS) have emerged as powerful frameworks for modelling complex decision-making problems across various real-world domains, such as robotic exploration, autonomous traffic management, and sensor network optimisation. MAMOS offers enhanced scalability and robustness through decentralised control and more accurately reflects inherent trade-offs between conflicting objectives. In MAMOS, each agent uses utility functions that map return vectors to scalar values. Existing MAMOS optimisation methods face challenges in handling heterogeneous objective and utility function settings, where training non-stationarity is intensified due to private utility functions and the associated policies. In this paper, we first theoretically prove that direct access to, or structured modeling of, global utility functions is necessary for the Bayesian Nash Equilibrium under decentralised execution constraints. To access the global utility functions while preserving the decentralised execution, we propose an Agent-Attention Multi-Agent Multi-Objective Reinforcement Learning (AA-MAMORL) framework. Our approach implicitly learns a joint belief over other agents' utility functions and their associated policies during centralised training, effectively mapping global states and utilities to each agent's policy. In execution, each agent independently selects actions based on local observations and its private utility function to approximate a BNE, without relying on inter-agent communication. We conduct comprehensive experiments in both a custom-designed MAMO Particle environment and the standard MOMALand benchmark. The results demonstrate that access to global preferences and our proposed AA-MAMORL significantly improve performance and consistently outperform state-of-the-art methods.
翻译:多智能体多目标系统(MAMOS)已成为建模各类现实领域复杂决策问题的强大框架,例如机器人探索、自主交通管理和传感器网络优化。MAMOS通过分布式控制提供了增强的可扩展性和鲁棒性,并更准确地反映了冲突目标之间的内在权衡。在MAMOS中,每个智能体使用将回报向量映射为标量值的效用函数。现有MAMOS优化方法在处理异构目标与效用函数设置时面临挑战,其中私有效用函数及相关策略加剧了训练的非平稳性。本文首先从理论上证明,在分布式执行约束下,直接访问或结构化建模全局效用函数是达成贝叶斯纳什均衡的必要条件。为在保持分布式执行的同时获取全局效用函数,我们提出了智能体注意力多智能体多目标强化学习(AA-MAMORL)框架。该方法在集中训练期间隐式学习其他智能体效用函数及相关策略的联合置信度,有效将全局状态与效用映射至各智能体的策略。在执行阶段,每个智能体基于局部观测和私有效用函数独立选择动作以逼近贝叶斯纳什均衡,无需依赖智能体间通信。我们在自定义的MAMO粒子环境与标准MOMALand基准测试中进行了全面实验。结果表明,获取全局偏好及我们提出的AA-MAMORL框架能显著提升性能,并持续优于现有最先进方法。