在NLP研究中处理和提交有害文本 (Handling and Presenting Harmful Text in NLP Research)

Text data can pose a risk of harm. However, the risks are not fully understood, and how to handle, present, and discuss harmful text in a safe way remains an unresolved issue in the NLP community. We provide an analytical framework categorising harms on three axes: (1) the harm type (e.g., misinformation, hate speech or racial stereotypes); (2) whether a harm is \textit{sought} as a feature of the research design if explicitly studying harmful content (e.g., training a hate speech classifier), versus \textit{unsought} if harmful content is encountered when working on unrelated problems (e.g., language generation or part-of-speech tagging); and (3) who it affects, from people (mis)represented in the data to those handling the data and those publishing on the data. We provide advice for practitioners, with concrete steps for mitigating harm in research and in publication. To assist implementation we introduce \textsc{HarmCheck} -- a documentation standard for handling and presenting harmful text in research.

翻译：文本数据可能构成伤害的风险。然而,风险没有得到完全理解,如何以安全的方式处理、提出和讨论有害文本仍然是国家语言方案社区尚未解决的问题。我们提供了一个分析框架,将伤害分为三个轴:(1) 伤害类型(如错误信息、仇恨言论或种族陈规定型观念);(2) 如果明确研究有害内容(如培训仇恨言论分类员),那么作为研究设计的一个特征的伤害是否为\textit{sawet},如果在研究无关的问题(如语言生成或部分发言标记)时遇到有害内容,那么这种危险将是一个未决问题。我们提供了一个分析框架,分析框架将伤害分为以下三个轴:(1) 伤害类型(如错误)、伤害类型(如错误)、伤害类型(如错误)、伤害是否构成研究设计的一个特征。我们向从业者提供咨询,包括减少研究和出版中的伤害的具体步骤。为了协助执行,我们引入了\ textsc{Harmcuck} -- 处理和在研究中介绍有害文本的文件标准。