3D multi-object tracking is a critical and challenging task in the field of autonomous driving. A common paradigm relies on modeling individual object motion, e.g., Kalman filters, to predict trajectories. While effective in simple scenarios, this approach often struggles in crowded environments or with inaccurate detections, as it overlooks the rich geometric relationships between objects. This highlights the need to leverage spatial cues. However, existing geometry-aware methods can be susceptible to interference from irrelevant objects, leading to ambiguous features and incorrect associations. To address this, we propose focusing on cue-consistency: identifying and matching stable spatial patterns over time. We introduce the Dynamic Scene Cue-Consistency Tracker (DSC-Track) to implement this principle. Firstly, we design a unified spatiotemporal encoder using Point Pair Features (PPF) to learn discriminative trajectory embeddings while suppressing interference. Secondly, our cue-consistency transformer module explicitly aligns consistent feature representations between historical tracks and current detections. Finally, a dynamic update mechanism preserves salient spatiotemporal information for stable online tracking. Extensive experiments on the nuScenes and Waymo Open Datasets validate the effectiveness and robustness of our approach. On the nuScenes benchmark, for instance, our method achieves state-of-the-art performance, reaching 73.2% and 70.3% AMOTA on the validation and test sets, respectively.
翻译:三维多目标跟踪是自动驾驶领域一项关键且具有挑战性的任务。常见范式依赖于对单个物体运动进行建模(例如卡尔曼滤波器)以预测轨迹。该方法在简单场景中虽有效,但在拥挤环境或检测不准确时往往表现不佳,因其忽视了物体间丰富的几何关系。这凸显了利用空间线索的必要性。然而,现有几何感知方法易受无关物体干扰,导致特征模糊和关联错误。为解决此问题,我们提出聚焦于线索一致性:识别并匹配随时间变化的稳定空间模式。我们引入了动态场景线索一致性跟踪器(DSC-Track)来实现这一原则。首先,我们设计了一个基于点对特征(PPF)的统一时空编码器,以学习具有区分性的轨迹嵌入并抑制干扰。其次,我们的线索一致性Transformer模块显式地对齐历史轨迹与当前检测间的一致性特征表示。最后,动态更新机制保留了显著的时空信息以实现稳定的在线跟踪。在nuScenes和Waymo Open数据集上的大量实验验证了我们方法的有效性和鲁棒性。例如,在nuScenes基准测试中,我们的方法取得了最先进的性能,在验证集和测试集上分别达到了73.2%和70.3%的AMOTA指标。