Large Language Models (LLMs) have recently revolutionized machine learning on text-attributed graphs, but the application of LLMs to graph outlier detection, particularly in the context of fake news detection, remains significantly underexplored. One of the key challenges is the scarcity of large-scale, realistic, and well-annotated datasets that can serve as reliable benchmarks for outlier detection. To bridge this gap, we introduce TAGFN, a large-scale, real-world text-attributed graph dataset for outlier detection, specifically fake news detection. TAGFN enables rigorous evaluation of both traditional and LLM-based graph outlier detection methods. Furthermore, it facilitates the development of misinformation detection capabilities in LLMs through fine-tuning. We anticipate that TAGFN will be a valuable resource for the community, fostering progress in robust graph-based outlier detection and trustworthy AI. The dataset is publicly available at https://huggingface.co/datasets/kayzliu/TAGFN and our code is available at https://github.com/kayzliu/tagfn.
翻译:大语言模型(LLMs)近期在文本属性图上的机器学习领域引发了革命性变革,但LLMs在图异常检测中的应用,尤其是在虚假新闻检测的背景下,仍显著缺乏深入探索。关键挑战之一在于缺乏大规模、真实且标注良好的数据集,可作为异常检测的可靠基准。为弥补这一空白,我们提出了TAGFN,一个面向异常检测(特别是虚假新闻检测)的大规模真实世界文本属性图数据集。TAGFN支持对传统及基于LLM的图异常检测方法进行严格评估,并通过微调促进LLMs在虚假信息检测能力上的发展。我们预期TAGFN将成为学术界的宝贵资源,推动基于图的鲁棒异常检测及可信人工智能的进步。该数据集公开发布于https://huggingface.co/datasets/kayzliu/TAGFN,相关代码可在https://github.com/kayzliu/tagfn获取。