This paper addresses the problem of autonomous task allocation by a swarm of autonomous, interactive drones in large-scale, dynamic spatio-temporal environments. When each drone independently determines navigation, sensing, and recharging options to choose from such that system-wide sensing requirements are met, the collective decision-making becomes an NP-hard decentralized combinatorial optimization problem. Existing solutions face significant limitations: distributed optimization methods such as collective learning often lack long-term adaptability, while centralized deep reinforcement learning (DRL) suffers from high computational complexity, scalability and privacy concerns. To overcome these challenges, we propose a novel hybrid optimization approach that combines long-term DRL with short-term collective learning. In this approach, each drone uses DRL methods to proactively determine high-level strategies, such as flight direction and recharging behavior, while leveraging collective learning to coordinate short-term sensing and navigation tasks with other drones in a decentralized manner. Extensive experiments using datasets derived from realistic urban mobility demonstrate that the proposed solution outperforms standalone state-of-the-art collective learning and DRL approaches by $27.83\%$ and $23.17\%$ respectively. Our findings highlight the complementary strengths of short-term and long-term decision-making, enabling energy-efficient, accurate, and sustainable traffic monitoring through swarms of drones.
翻译:本文研究了大规模动态时空环境中自主交互无人机集群的自适应任务分配问题。当每架无人机独立确定导航、感知与充电选项以满足系统级感知需求时,集体决策将转化为NP难度的分散组合优化问题。现有解决方案存在显著局限:分布式优化方法(如集体学习)通常缺乏长期适应性,而集中式深度强化学习(DRL)则面临计算复杂度高、可扩展性不足及隐私风险等挑战。为克服这些难题,我们提出一种融合长期DRL与短期集体学习的新型混合优化方法。该方法中,每架无人机运用DRL主动制定高层策略(如飞行方向与充电行为),同时借助集体学习以分散化方式与其他无人机协调短期感知与导航任务。基于真实城市移动数据集的实验表明,所提方案相较独立最优的集体学习与DRL方法分别提升性能27.83%与23.17%。研究结果揭示了短期与长期决策机制的互补优势,为通过无人机集群实现节能、精准且可持续的交通监测提供了新途径。