We introduce a novel backbone architecture to improve target-perception ability of feature representation for tracking. Specifically, having observed that de facto frameworks perform feature matching simply using the outputs from backbone for target localization, there is no direct feedback from the matching module to the backbone network, especially the shallow layers. More concretely, only the matching module can directly access the target information (in the reference frame), while the representation learning of candidate frame is blind to the reference target. As a consequence, the accumulation effect of target-irrelevant interference in the shallow stages may degrade the feature quality of deeper layers. In this paper, we approach the problem from a different angle by conducting multiple branch-wise interactions inside the Siamese-like backbone networks (InBN). At the core of InBN is a general interaction modeler (GIM) that injects the prior knowledge of reference image to different stages of the backbone network, leading to better target-perception and robust distractor-resistance of candidate feature representation with negligible computation cost. The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks. In particular, the CNN version (based on SiamCAR) improves the baseline with 3.2/6.9 absolute gains of SUC on LaSOT/TNL2K, respectively. The Transformer version obtains SUC scores of 65.7/52.0 on LaSOT/TNL2K, which are on par with recent state of the arts. Code and models will be released.
翻译:具体地说,我们引入了一个新的主干结构,以提高用于跟踪的地貌代表的目标认知能力。具体地说,我们观察到,事实上的框架所表现的特征只是利用目标本地化的骨干产出进行匹配,因此,没有匹配模块直接反馈到主干网络,特别是浅层。更具体地说,只有匹配模块能够直接访问目标信息(参考框架),而候选人框架的代管学习却无视参考目标目标目标定位能力。结果,在浅层次上与目标无关的干扰的累积效应可能会降低更深层次的特征质量。在本文中,我们从不同的角度处理问题,在类似Siameese的骨干网络(InBN)中进行多处处处处处性互动。在InBN是一个一般性的互动模型(GIM),将先前的参考图像知识引入主干网络的不同阶段(在参考框架框架中),导致更好的目标感知觉和强烈的偏差度,且计算成本微不足道。 拟议的GIMM模块和BN机制将适用于不同的主干类型,包括CNN和变器,以便进行最新的改进。