Our work aims to obtain 3D reconstruction of hands and manipulated objects from monocular videos. Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations. The supervised learning approach to this problem, however, requires 3D supervision and remains limited to constrained laboratory settings and simulators for which 3D ground truth is available. In this paper we first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions. Our method relies on cues obtained with common methods for object detection, hand pose estimation and instance segmentation. We quantitatively evaluate our approach and show that it can be applied to datasets with varying levels of difficulty for which training data is unavailable.
翻译:我们的工作旨在从单向视频中获取手和被操纵物体的三维重建。重新构筑人工物体操纵为机器人和从人类演示中学习的巨大潜力。然而,对这一问题的监督式学习方法需要三维监督,并且仍然局限于有限的实验室设置和模拟器,而三维地面真象是可以利用的。在本文中,我们首先提出手基重建无学习的合适方法,可以无缝地处理两手物体的相互作用。我们的方法依赖于在物体探测、手势估计和实例分割等共同方法中获得的线索。我们量化地评估了我们的方法,并表明它可以适用于有不同程度的困难但缺乏培训数据的数据集。