The TrackNet series has established a strong baseline for fast-moving small object tracking in sports. However, existing iterations face significant limitations: V1-V3 struggle with occlusions due to a reliance on purely visual cues, while TrackNetV4, despite introducing motion inputs, suffers from directional ambiguity as its absolute difference method discards motion polarity. To overcome these bottlenecks, we propose TrackNetV5, a robust architecture integrating two novel mechanisms. First, to recover lost directional priors, we introduce the Motion Direction Decoupling (MDD) module. Unlike V4, MDD decomposes temporal dynamics into signed polarity fields, explicitly encoding both movement occurrence and trajectory direction. Second, we propose the Residual-Driven Spatio-Temporal Refinement (R-STR) head. Operating on a coarse-to-fine paradigm, this Transformer-based module leverages factorized spatio-temporal contexts to estimate a corrective residual, effectively recovering occluded targets. Extensive experiments on the TrackNetV2 dataset demonstrate that TrackNetV5 achieves a new state-of-the-art F1-score of 0.9859 and an accuracy of 0.9733, significantly outperforming previous versions. Notably, this performance leap is achieved with a marginal 3.7% increase in FLOPs compared to V4, maintaining real-time inference capabilities while delivering superior tracking precision.
翻译:TrackNet系列已在体育领域快速移动小目标跟踪中建立了坚实的基准。然而,现有迭代版本面临显著局限:V1至V3因依赖纯视觉线索而在遮挡场景中表现不佳;TrackNetV4虽引入了运动输入,但其绝对差分方法丢弃了运动极性,导致方向模糊性。为突破这些瓶颈,我们提出TrackNetV5——一种集成两种创新机制的鲁棒架构。首先,为恢复丢失的方向先验,我们引入运动方向解耦模块。与V4不同,该模块将时序动态分解为带符号的极性场,显式编码运动发生与轨迹方向。其次,我们提出残差驱动时空优化头。该基于Transformer的模块采用由粗到精的范式,利用因子化的时空上下文估计校正残差,有效恢复被遮挡目标。在TrackNetV2数据集上的大量实验表明,TrackNetV5实现了0.9859的F1分数与0.9733的准确率,刷新了当前最优性能,显著超越先前版本。值得注意的是,相较于V4,该性能飞跃仅伴随3.7%的浮点运算量增长,在保持实时推理能力的同时实现了更优的跟踪精度。