While recent developments in large language models have improved bias detection and classification, sensitive subjects like religion still present challenges because even minor errors can result in severe misunderstandings. In particular, multilingual models often misrepresent religions and have difficulties being accurate in religious contexts. To address this, we introduce BRAND: Bilingual Religious Accountable Norm Dataset, which focuses on the four main religions of South Asia: Buddhism, Christianity, Hinduism, and Islam, containing over 2,400 entries, and we used three different types of prompts in both English and Bengali. Our results indicate that models perform better in English than in Bengali and consistently display bias toward Islam, even when answering religion-neutral questions. These findings highlight persistent bias in multilingual models when similar questions are asked in different languages. We further connect our findings to the broader issues in HCI regarding religion and spirituality.
翻译:尽管大语言模型的最新进展提升了偏见检测与分类能力,但宗教等敏感主题仍存在挑战,因为即使细微的误差也可能导致严重的误解。特别是多语言模型常误传宗教信息,在宗教语境中难以保持准确性。为此,我们提出了BRAND:双语宗教责任规范数据集,聚焦南亚四大宗教——佛教、基督教、印度教与伊斯兰教,包含超过2400条条目,并使用英语和孟加拉语的三种不同提示类型。结果显示,模型在英语中的表现优于孟加拉语,且即使回答宗教中立问题时,也持续表现出对伊斯兰教的偏见。这些发现揭示了多语言模型在不同语言中处理相似问题时存在的顽固偏见。我们进一步将研究结果与人机交互领域中宗教与灵性相关的更广泛议题联系起来。