Visual Object Tracking (VOT) can be seen as an extended task of Few-Shot Learning (FSL). While the concept of FSL is not new in tracking and has been previously applied by prior works, most of them are tailored to fit specific types of FSL algorithms and may sacrifice running speed. In this work, we propose a generalized two-stage framework that is capable of employing a large variety of FSL algorithms while presenting faster adaptation speed. The first stage uses a Siamese Regional Proposal Network to efficiently propose the potential candidates and the second stage reformulates the task of classifying these candidates to a few-shot classification problem. Following such a coarse-to-fine pipeline, the first stage proposes informative sparse samples for the second stage, where a large variety of FSL algorithms can be conducted more conveniently and efficiently. As substantiation of the second stage, we systematically investigate several forms of optimization-based few-shot learners from previous works with different objective functions, optimization methods, or solution space. Beyond that, our framework also entails a direct application of the majority of other FSL algorithms to visual tracking, enabling mutual communication between researchers on these two topics. Extensive experiments on the major benchmarks, VOT2018, OTB2015, NFS, UAV123, TrackingNet, and GOT-10k are conducted, demonstrating a desirable performance gain and a real-time speed.
翻译:视觉天体跟踪(VOT)可被视为少点热学习(FSL)的延伸任务。 FSL的概念在跟踪方面并不新鲜,以前曾由先前的工程应用过,但大多数FSL的概念是适合FSL特定类型的算法,可能牺牲运行速度。在这项工作中,我们提议了一个普遍化的两阶段框架,它能够使用多种FSL算法,同时提供更快的适应速度。第一阶段使用Siamese区域建议网络来有效推荐潜在候选人,而第二阶段则将这些候选人分类为少数分类问题。在经过这样一个粗略到细微的管道后,第一阶段为第二阶段提出了信息性稀释样本,其中大量FSL算法可以更方便、更高效地进行。作为第二阶段的证实,我们系统地调查了以前工作中以不同客观功能、优化方法或解决方案空间为基础的几种基于优化的微镜头学习者。此外,我们的框架还直接应用FSL算法的大多数其他算法来进行视觉跟踪,使研究人员之间能够就O-FS-18和MFS-2010这两个主题进行真正的相互交流。