Localizing a person from a moving monocular camera is critical for Human-Robot Interaction (HRI). To estimate the 3D human position from a 2D image, existing methods either depend on the geometric assumption of a fixed camera or use a position regression model trained on datasets containing little camera ego-motion. These methods are vulnerable to severe camera ego-motion, resulting in inaccurate person localization. We consider person localization as a part of a pose estimation problem. By representing a human with a four-point model, our method jointly estimates the 2D camera attitude and the person's 3D location through optimization. Evaluations on both public datasets and real robot experiments demonstrate our method outperforms baselines in person localization accuracy. Our method is further implemented into a person-following system and deployed on an agile quadruped robot.
翻译:从移动的单目相机中定位人体对于人机交互至关重要。为了从二维图像估计人体的三维位置,现有方法要么依赖于固定相机的几何假设,要么使用在包含极少相机自运动的数据集上训练的位置回归模型。这些方法在面对剧烈的相机自运动时表现脆弱,导致人体定位不准确。我们将人体定位视为姿态估计问题的一部分。通过使用四点模型表示人体,我们的方法通过优化联合估计相机的二维姿态与人体的三维位置。在公开数据集和真实机器人实验上的评估表明,我们的方法在人体定位精度上优于基线方法。该方法进一步被实现为人体跟随系统,并部署在一台灵巧的四足机器人上。