Visual Sentiment Analysis (VSA) is a challenging task due to the vast diversity of emotionally salient images and the inherent difficulty of acquiring sufficient data to capture this variability comprehensively. Key obstacles include building large-scale VSA datasets and developing effective methodologies that enable algorithms to identify emotionally significant elements within an image. These challenges are reflected in the limited generalization performance of VSA algorithms and models when trained and tested across different datasets. Starting from a pool of existing data collections, our approach enables the creation of a new larger dataset that not only contains a wider variety of images than the original ones, but also permits training new models with improved capability to focus on emotionally relevant combinations of image elements. This is achieved through the integration of the semiotic isotopy concept within the dataset creation process, providing deeper insights into the emotional content of images. Empirical evaluations show that models trained on a dataset generated with our method consistently outperform those trained on the original data collections, achieving superior generalization across major VSA benchmarks
翻译:视觉情感分析(VSA)是一项具有挑战性的任务,原因在于情感显著性图像的巨大多样性,以及全面获取足够数据以捕捉这种变异性的固有困难。关键障碍包括构建大规模VSA数据集,以及开发有效方法使算法能够识别图像中的情感显著性元素。这些挑战体现在VSA算法和模型在不同数据集间训练和测试时泛化性能有限的问题上。本研究从现有数据集合出发,提出一种方法能够创建新的更大规模数据集,该数据集不仅包含比原始数据更广泛的图像种类,还能训练新模型以提升其聚焦于图像元素情感相关组合的能力。这一目标通过在数据集构建过程中融入符号同位素概念实现,从而更深入地解析图像的情感内容。实证评估表明,采用本方法生成的数据集训练的模型,在主要VSA基准测试中始终优于基于原始数据集合训练的模型,展现出更优越的泛化性能。