In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments--comments that were later alchemized into $\textit{TraceTarnish}$ data--to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features--features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues--function-word frequencies, content-word distributions, and the Type-Token Ratio--serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.
翻译:本研究对基于对抗性文体计量学原理的文本消息作者匿名化攻击脚本《TraceTarnish》进行了更严格的评估。为确保攻击的有效性与实用性,我们采集、处理并分析了Reddit评论(后续转化为《TraceTarnish》数据)以获取关键洞见。经《StyloMetrix》增强处理的《TraceTarnish》数据进一步生成文体计量特征,通过信息增益准则筛选后仅保留最具信息量、预测性和判别性的特征。研究发现:功能词及其类型(L_FUNC_A & L_FUNC_T)、内容词及其类型(L_CONT_A & L_CONT_T)以及类符形符比(ST_TYPE_TOKEN_RATIO_LEMMAS)均产生显著的信息增益读数。这些识别出的文体计量线索——功能词频率、内容词分布和类符形符比——可作为可靠的入侵指标,揭示文本是否经过刻意篡改以掩盖真实作者。同时,这些特征亦可作为取证信标,警示防御者对抗性文体计量攻击的存在;尽管在缺乏原始消息的情况下,此类信号可能难以察觉,因其似乎依赖于变换前后的对比分析。“试图抹除痕迹时,往往留下更深的印记。”基于此认知,我们围绕这五个独立特征构建了《TraceTarnish》的操作框架与输出机制,并以此为基础设计实施增强方案以强化攻击效能。