We introduce DeepFleet, a suite of foundation models designed to support coordination and planning for large-scale mobile robot fleets. These models are trained on fleet movement data, including robot positions, goals, and interactions, from hundreds of thousands of robots in Amazon warehouses worldwide. DeepFleet consists of four architectures that each embody a distinct inductive bias and collectively explore key points in the design space for multi-agent foundation models: the robot-centric (RC) model is an autoregressive decision transformer operating on neighborhoods of individual robots; the robot-floor (RF) model uses a transformer with cross-attention between robots and the warehouse floor; the image-floor (IF) model applies convolutional encoding to a multi-channel image representation of the full fleet; and the graph-floor (GF) model combines temporal attention with graph neural networks for spatial relationships. In this paper, we describe these models and present our evaluation of the impact of these design choices on prediction task performance. We find that the robot-centric and graph-floor models, which both use asynchronous robot state updates and incorporate the localized structure of robot interactions, show the most promise. We also present experiments that show that these two models can make effective use of larger warehouses operation datasets as the models are scaled up.
翻译:本文介绍DeepFleet,一套专为支持大规模移动机器人车队协调与规划而设计的基础模型。这些模型基于全球亚马逊仓库中数十万台机器人的车队移动数据进行训练,包括机器人位置、目标及交互信息。DeepFleet包含四种架构,每种架构体现不同的归纳偏置,共同探索多智能体基础模型设计空间的关键点:以机器人为中心(RC)模型是基于个体机器人邻域的自回归决策Transformer;机器人-场地(RF)模型采用Transformer,实现机器人与仓库场地间的交叉注意力机制;图像-场地(IF)模型对完整车队的多通道图像表示应用卷积编码;图-场地(GF)模型将时序注意力与图神经网络结合以处理空间关系。本文详细描述了这些模型,并评估了不同设计选择对预测任务性能的影响。研究发现,采用异步机器人状态更新并融合机器人交互局部化结构的以机器人为中心模型和图-场地模型最具应用潜力。实验还表明,随着模型规模扩大,这两种模型能有效利用更大型仓库运营数据集。