Sentiment analysis of Arabic dialects presents significant challenges due to linguistic diversity and the scarcity of annotated data. This paper describes our approach to the AHaSIS shared task, which focuses on sentiment analysis on Arabic dialects in the hospitality domain. The dataset comprises hotel reviews written in Moroccan and Saudi dialects, and the objective is to classify the reviewers sentiment as positive, negative, or neutral. We employed the SetFit (Sentence Transformer Fine-tuning) framework, a data-efficient few-shot learning technique. On the official evaluation set, our system achieved an F1 of 73%, ranking 12th among 26 participants. This work highlights the potential of few-shot learning to address data scarcity in processing nuanced dialectal Arabic text within specialized domains like hotel reviews.
翻译:阿拉伯语方言的情感分析因语言多样性及标注数据稀缺而面临显著挑战。本文阐述了我们在AHaSIS共享任务中采用的方法,该任务专注于酒店领域的阿拉伯语方言情感分析。数据集包含摩洛哥和沙特方言撰写的酒店评论,目标是将评论者情感分类为积极、消极或中性。我们采用了SetFit(句子Transformer微调)框架,这是一种数据高效的少样本学习技术。在官方评估集上,我们的系统取得了73%的F1分数,在26个参赛团队中排名第12。本研究凸显了少样本学习在酒店评论等专业领域中处理具有细微差别的阿拉伯语方言文本时,应对数据稀缺问题的潜力。