Labelling of human behavior analysis data is a complex and time consuming task. In this paper, a fully automatic technique for labelling an image based gaze behavior dataset for driver gaze zone estimation is proposed. Domain knowledge is added to the data recording paradigm and later labels are generated in an automatic manner using Speech To Text conversion (STT). In order to remove the noise in the STT process due to different illumination and ethnicity of subjects in our data, the speech frequency and energy are analysed. The resultant Driver Gaze in the Wild (DGW) dataset contains 586 recordings, captured during different times of the day including evenings. The large scale dataset contains 338 subjects with an age range of 18-63 years. As the data is recorded in different lighting conditions, an illumination robust layer is proposed in the Convolutional Neural Network (CNN). The extensive experiments show the variance in the dataset resembling real-world conditions and the effectiveness of the proposed CNN pipeline. The proposed network is also fine-tuned for the eye gaze prediction task, which shows the discriminativeness of the representation learnt by our network on the proposed DGW dataset. Project Page: https://sites.google.com/view/drivergazeprediction/home
翻译:人类行为分析数据标签是一项复杂和耗时的任务。在本文中,提议了一种完全自动的技术,用于为驱动性凝视区估算的以图像为基础的凝视行为数据集贴标签,将域知识添加到数据记录范式中,随后使用“语音转换”自动生成标签。为了消除STT进程中由于我们数据中各主题的不同光化和族裔而出现的噪音,将分析语音频率和能量。野生(DGW)数据集中随后产生的驱动器Gaze(DGW)数据集包含586个记录,记录的时间不同,包括晚上。大型数据集包含338个主题,年龄范围为18-63年。由于数据记录在不同照明条件下,因此在进化神经网络(CNN)中提议了一个照明强度的层。广泛的实验显示数据集在重新组合真实世界条件和拟议的CNN管道上的差异。拟议中的网络也为眼视视波预测任务作了精确调整,显示我们网络在拟议的DGW/homeriverview上所了解的代表性的差别性。项目:httpsite/hitesite/homegregregreviews。