面向协作多智能体强化学习的带宽约束变分消息编码 (Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning)

Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent methods excel at learning sparse coordination graphs-determining who communicates with whom-they do not address what information should be transmitted under hard bandwidth constraints. We study this bandwidth-limited regime and show that naive dimensionality reduction consistently degrades coordination performance. Hard bandwidth constraints force selective encoding, but deterministic projections lack mechanisms to control how compression occurs. We introduce Bandwidth-constrained Variational Message Encoding (BVME), a lightweight module that treats messages as samples from learned Gaussian posteriors regularized via KL divergence to an uninformative prior. BVME's variational framework provides principled, tunable control over compression strength through interpretable hyperparameters, directly constraining the representations used for decision-making. Across SMACv1, SMACv2, and MPE benchmarks, BVME achieves comparable or superior performance while using 67--83% fewer message dimensions, with gains most pronounced on sparse graphs where message quality critically impacts coordination. Ablations reveal U-shaped sensitivity to bandwidth, with BVME excelling at extreme ratios while adding minimal overhead.

翻译：基于图结构的多智能体强化学习（MARL）通过将智能体建模为节点、通信链路建模为边，在部分可观测条件下实现协调行为。尽管现有方法擅长学习稀疏协调图（确定智能体间的通信关系），但未解决在严格带宽约束下应传输何种信息的问题。本文研究带宽受限场景，并证明简单的降维方法会持续损害协调性能。硬带宽约束迫使选择性编码，但确定性投影缺乏控制压缩过程的机制。我们提出带宽约束变分消息编码（BVME），这是一种轻量级模块，将消息视为从学习到的高斯后验分布中采样的结果，并通过KL散度以无信息先验进行正则化。BVME的变分框架通过可解释的超参数提供原则性、可调节的压缩强度控制，直接约束用于决策的表征。在SMACv1、SMACv2和MPE基准测试中，BVME在使用消息维度减少67%至83%的情况下，实现了相当或更优的性能，且在稀疏图上增益最为显著——此时消息质量对协调效果具有关键影响。消融实验揭示了性能对带宽的U形敏感性，BVME在极端压缩比下表现优异，同时仅引入极小开销。