TAP-Vid:用视频跟踪任何点的基准 (TAP-Vid: A Benchmark for Tracking Any Point in a Video)

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.

翻译：从视频中获取的通用运动理解不仅涉及跟踪对象,还涉及其表面变形和移动的方式。这一信息有助于推断3D形状、物理属性和物体相互作用。虽然跟踪长视频片段表面任意物理点的问题引起了一些关注,但到目前为止还没有数据组或评估基准。在本文中,我们首先将问题正式化,将其命名为跟踪任何点(TAP ) 。我们引入了一个配套基准,即TAP-Vid,它由真实世界视频组成,对点轨迹有准确的人文说明,以及合成视频带有完美的地面真相轨道。我们基准建设的中心是一个新型半自动多源管道,它使用光学流量估计来补偿像相机摇晃那样的更简单、短期运动,让注意更难的视频部分。我们在合成数据上验证我们的管道,并提出一个简单的端到端跟踪模型TAP-Net,显示它在接受合成数据培训时超过了我们基准上的所有方法。

相关内容

Microsoft Surface

关注 5

Surface 是微软公司（ Microsoft）旗下一系列使用 Windows 10（早期为 Windows 8.X）操作系统的电脑产品，目前有 Surface、Surface Pro 和 Surface Book 三个系列。 2012 年 6 月 18 日，初代 Surface Pro/RT 由时任微软 CEO 史蒂夫·鲍尔默发布于在洛杉矶举行的记者会，2012 年 10 月 26 日上市销售。

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日