In urban or crowded environments, humans rely on eye contact for fast and efficient communication with nearby people. Autonomous agents also need to detect eye contact to interact with pedestrians and safely navigate around them. In this paper, we focus on eye contact detection in the wild, i.e., real-world scenarios for autonomous vehicles with no control over the environment or the distance of pedestrians. We introduce a model that leverages semantic keypoints to detect eye contact and show that this high-level representation (i) achieves state-of-the-art results on the publicly-available dataset JAAD, and (ii) conveys better generalization properties than leveraging raw images in an end-to-end network. To study domain adaptation, we create LOOK: a large-scale dataset for eye contact detection in the wild, which focuses on diverse and unconstrained scenarios for real-world generalization. The source code and the LOOK dataset are publicly shared towards an open science mission.
翻译:在城市或拥挤环境中,人类依靠眼睛接触与附近的人进行快速和高效的沟通。 自治机构还需要检测眼接触,以便与行人互动,并安全地绕行。 在本文中,我们侧重于在野外检测眼接触,即没有控制环境或行人距离的自主车辆真实世界情景。 我们引入了一种模型,利用语系关键点检测眼接触,并显示这一高级别代表(一) 在公开可用的数据集 JAAAD 上取得了最先进的结果,以及(二) 传达比在端对端网络中利用原始图像更好的一般化特性。为了研究领域适应性,我们创建了“看:大规模数据集,用于在野外检测眼接触,侧重于各种和未受限制的情景,用于现实世界的普及。源代码和 Look 数据集被公开共享,用于开放的科学任务。