在线讨论中仇恨、毒性与极端言论的集体治理 (Collective moderation of hate, toxicity, and extremity in online discussions)

In the digital age, hate speech poses a threat to the functioning of social media platforms as spaces for public discourse. Top-down approaches to moderate hate speech encounter difficulties due to conflicts with freedom of expression and issues of scalability. Counter speech, a form of collective moderation by citizens, has emerged as a potential remedy. Here, we aim to investigate which counter speech strategies are most effective in reducing the prevalence of hate, toxicity, and extremity on online platforms. We analyze more than 130,000 discussions on German Twitter starting at the peak of the migrant crisis in 2015 and extending over four years. We use human annotation and machine learning classifiers to identify argumentation strategies, ingroup and outgroup references, emotional tone, and different measures of discourse quality. Using matching and time-series analyses we discern the effectiveness of naturally observed counter speech strategies on the micro-level (individual tweet pairs), meso-level (entire discussions) and macro-level (over days). We find that expressing straightforward opinions, even if not factual but devoid of insults, results in the least subsequent hate, toxicity, and extremity over all levels of analyses. This strategy complements currently recommended counter speech strategies and is easy for citizens to engage in. Sarcasm can also be effective in improving discourse quality, especially in the presence of organized extreme groups. Going beyond one-shot analyses on smaller samples prevalent in most prior studies, our findings have implications for the successful management of public online spaces through collective civic moderation.

翻译：在数字时代，仇恨言论对社交媒体平台作为公共话语空间的功能构成威胁。自上而下的仇恨言论治理方法因与言论自由的冲突及可扩展性问题而面临困境。反制言论——一种由公民参与的集体治理形式——已成为潜在的解决方案。本研究旨在探究何种反制言论策略能最有效地降低在线平台中仇恨、毒性与极端言论的普遍性。我们分析了德国推特平台上自2015年移民危机高峰起持续四年的超过13万条讨论。通过人工标注与机器学习分类器，我们识别了论证策略、内外群体指涉、情感基调及不同的话语质量指标。运用匹配分析与时间序列分析，我们在微观层面（单条推文对）、中观层面（完整讨论串）和宏观层面（跨日尺度）评估了自然观察到的反制言论策略的有效性。研究发现，表达直接观点（即使非事实性但无侮辱内容）能在所有分析层面带来最低的后续仇恨、毒性及极端言论。该策略是对当前推荐反制策略的补充，且易于公民实践。讽刺性言论在提升话语质量方面同样有效，尤其在存在组织化极端群体时。相较于多数先前研究基于小样本的单次分析，我们的发现对通过集体公民治理实现公共在线空间的成功管理具有启示意义。