The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.
翻译:文本生成能力的迅猛提升伴随着对机器生成文本检测的日益关注:即识别给定文本是由模型生成还是由人类撰写的能力。尽管检测模型表现出强大的性能,但它们也可能造成显著的负面影响。本文探讨了英语机器生成文本检测系统中潜在的偏见。我们整理了一个学生论文数据集,并评估了16种不同检测系统在四个属性上的偏见:性别、种族/民族、英语学习者身份以及经济状况。我们使用基于回归的模型评估这些属性,以确定影响的显著性和强度,并进行亚组分析。研究发现,尽管偏见在不同系统间普遍不一致,但存在几个关键问题:多个模型倾向于将弱势群体的文本分类为机器生成;英语学习者的论文更可能被分类为机器生成;经济困难学生的论文被分类为机器生成的可能性较低;非白人英语学习者的论文相对于白人英语学习者被过度分类为机器生成。最后,我们进行了人工标注,发现尽管人类在检测任务上表现普遍较差,但在所研究的属性上未显示出显著偏见。