Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge the gap between the language decisions and the vehicle control commands by standardizing the decision states according to the off-the-shelf motion planning module. (2) We employ a multimodal LLM (MLLM) to model the behavior planning module of a module AD system, which uses driving rules, user commands, and inputs from various sensors (e.g., camera, lidar) as input and makes driving decisions and provide explanations; This model can plug-and-play in existing AD systems such as Autopilot and Apollo for close-loop driving. (3) We design an effective data engine to collect a dataset that includes decision state and corresponding explanation annotation for model training and evaluation. We conduct extensive experiments and show that replacing the decision-making modules of the Autopilot and Apollo with DriveMLM resulted in significant improvements of 3.2 and 4.7 points on the CARLA Town05 Long respectively, demonstrating the effectiveness of our model. We hope this work can serve as a baseline for autonomous driving with LLMs.
翻译:大语言模型(LLMs)为智能体开辟了新的可能性,赋予其类人的思维与认知能力。本研究深入探索大语言模型在自动驾驶(AD)领域的潜力。我们提出了DriveMLM,一种基于LLM的自动驾驶框架,能够在真实仿真环境中执行闭环自动驾驶。为此,(1)我们根据现成的运动规划模块标准化决策状态,从而弥合语言决策与车辆控制指令之间的鸿沟。(2)我们采用多模态大语言模型(MLLM)来建模模块化自动驾驶系统的行为规划模块,该模型以驾驶规则、用户指令及多种传感器(如摄像头、激光雷达)输入作为输入,做出驾驶决策并提供解释;该模型可即插即用于现有自动驾驶系统(如Autopilot和Apollo)以实现闭环驾驶。(3)我们设计了一个高效的数据引擎来收集包含决策状态及相应解释标注的数据集,用于模型训练与评估。我们进行了大量实验,结果表明,将Autopilot和Apollo的决策模块替换为DriveMLM后,在CARLA Town05 Long基准上分别实现了3.2分和4.7分的显著提升,证明了我们模型的有效性。我们希望这项工作能为基于大语言模型的自动驾驶研究提供一个基准。