Online Social Networks (OSNs) widely adopt content moderation to mitigate the spread of abusive and toxic discourse. Nonetheless, the real effectiveness of moderation interventions remains unclear due to the high cost of data collection and limited experimental control. The latest developments in Natural Language Processing pave the way for a new evaluation approach. Large Language Models (LLMs) can be successfully leveraged to enhance Agent-Based Modeling and simulate human-like social behavior with unprecedented degree of believability. Yet, existing tools do not support simulation-based evaluation of moderation strategies. We fill this gap by designing a LLM-powered simulator of OSN conversations enabling a parallel, counterfactual simulation where toxic behavior is influenced by moderation interventions, keeping all else equal. We conduct extensive experiments, unveiling the psychological realism of OSN agents, the emergence of social contagion phenomena and the superior effectiveness of personalized moderation strategies.
翻译:在线社交网络广泛采用内容审核机制以遏制滥用与有害言论的传播。然而,由于数据收集成本高昂及实验控制受限,审核干预的实际效果仍不明确。自然语言处理领域的最新进展为评估方法开辟了新路径。大语言模型可成功赋能基于代理的建模,以前所未有的可信度模拟类人社交行为。然而,现有工具尚不支持基于模拟的审核策略评估。本研究通过设计一个基于大语言模型的在线社交网络对话模拟器来填补这一空白,该模拟器支持在保持其他条件不变的情况下,通过审核干预影响有害行为的并行反事实模拟。我们开展了大量实验,揭示了在线社交代理的心理现实性、社会传染现象的出现机制,以及个性化审核策略的卓越有效性。