Fake tweets are observed to be ever-increasing, demanding immediate countermeasures to combat their spread. During COVID-19, tweets with misinformation should be flagged and neutralized in their early stages to mitigate the damages. Most of the existing methods for early detection of fake news assume to have enough propagation information for large labeled tweets -- which may not be an ideal setting for cases like COVID-19 where both aspects are largely absent. In this work, we present ENDEMIC, a novel early detection model which leverages exogenous and endogenous signals related to tweets, while learning on limited labeled data. We first develop a novel dataset, called CTF for early COVID-19 Twitter fake news, with additional behavioral test sets to validate early detection. We build a heterogeneous graph with follower-followee, user-tweet, and tweet-retweet connections and train a graph embedding model to aggregate propagation information. Graph embeddings and contextual features constitute endogenous, while time-relative web-scraped information constitutes exogenous signals. ENDEMIC is trained in a semi-supervised fashion, overcoming the challenge of limited labeled data. We propose a co-attention mechanism to fuse signal representations optimally. Experimental results on ECTF, PolitiFact, and GossipCop show that ENDEMIC is highly reliable in detecting early fake tweets, outperforming nine state-of-the-art methods significantly.
翻译:在COVID-19期间,含有错误信息的推文应该在最初阶段被贴上标签,并在减轻损害的早期阶段被抑制。大多数现有的早期检测假新闻的方法假定,对贴标签的推文而言,有足够的传播信息,这或许不是像COVID-19(COVID-19)这样的案件的理想环境,因为这两个方面基本上都不存在。在这项工作中,我们介绍了ENDEMIC,这是一个利用与推文有关的外部和内生信号的新型早期检测模型,同时学习了有限的标签数据。我们首先开发了一个新数据集,要求CTF提供早期COVID-19(Twitter)假消息,并增加了行为测试装置,以验证早期检测。我们用后续跟踪者、用户和推特-retweet(Twit-retweet)连接建立一个混杂的图表嵌入模型,并将图嵌入和背景特征形成内源,同时将具有时间代表性的可靠网络扭曲信息构成外源信号。 ENDEMIC首先开发了一套半封闭式的数据集,克服了有限标签标签化数据测试的挑战。我们建议采用高额的GEDIMF(G)的模拟模型, 展示一个共同显示高额模型。