Visual emotion analysis (VEA) has attracted great attention recently, due to the increasing tendency of expressing and understanding emotions through images on social networks. Different from traditional vision tasks, VEA is inherently more challenging since it involves a much higher level of complexity and ambiguity in human cognitive process. Most of the existing methods adopt deep learning techniques to extract general features from the whole image, disregarding the specific features evoked by various emotional stimuli. Inspired by the \textit{Stimuli-Organism-Response (S-O-R)} emotion model in psychological theory, we proposed a stimuli-aware VEA method consisting of three stages, namely stimuli selection (S), feature extraction (O) and emotion prediction (R). First, specific emotional stimuli (i.e., color, object, face) are selected from images by employing the off-the-shelf tools. To the best of our knowledge, it is the first time to introduce stimuli selection process into VEA in an end-to-end network. Then, we design three specific networks, i.e., Global-Net, Semantic-Net and Expression-Net, to extract distinct emotional features from different stimuli simultaneously. Finally, benefiting from the inherent structure of Mikel's wheel, we design a novel hierarchical cross-entropy loss to distinguish hard false examples from easy ones in an emotion-specific manner. Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on four public visual emotion datasets. Ablation study and visualizations further prove the validity and interpretability of our method.
翻译:视觉情感分析(VEA)最近引起人们的极大关注,因为通过社交网络图像表达和理解情感的倾向日益增强。与传统的视觉任务不同,VEA本身就更具挑战性,因为它涉及人类认知过程的复杂程度和模糊性。大多数现有方法都采用深层次的学习技术,从整个图像中提取一般特征,而无视各种情感刺激带来的具体特征。根据我们的知识,我们首次在心理理论中引入了情感选择模式(S-O-R),我们建议了一种刺激-觉悟VEA方法,它由三个阶段组成,即:刺激选择(S)、特征提取(O)和情感预测(R)。首先,特定的情感模拟(即颜色、对象、面)技术从图像中提取一般特征,而不用各种情绪刺激工具。根据我们的知识,我们第一次在终端到终端的网络中向VEA引入了不真实的智能选择过程。然后,我们设计了三种特定的网络网络,即简单性选择(i) 直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-直观-