Graph Neural Networks (GNNs) are a highly effective neural network architecture for processing graph-structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph structure, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior performance across various tasks. However, it also makes GNNs more susceptible to noise from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. To address this issue, this paper proposes a novel method for selecting noise-sensitive training samples from the original training set to construct a smaller yet more effective training set for model training. These samples are used to help improve the model's ability to correctly process data in noisy environments. We have evaluated our approach on three of the most classical GNN models GCN, GAT, and GraphSAGE as well as three widely used benchmark datasets: Cora, Citeseer, and PubMed. Our experiments demonstrate that the proposed method can substantially boost the training of Graph Neural Networks compared to using randomly sampled training sets of the same size from the original training set and the larger original full training set. We further proposed a robust-node based hypergraph partitioning method, an adversarial robustness based graph pruning method for GNN defenses and a related spectral edge attack method.
翻译:图神经网络(GNNs)是一种处理图结构数据的高效神经网络架构。与传统神经网络仅依赖数据特征作为输入不同,GNNs同时利用表示数据点间关系的图结构以及数据的特征矩阵来优化其特征表示。这一独特能力使GNNs在多种任务中实现卓越性能。然而,这也使得GNNs更容易受到来自图结构和数据特征的噪声影响,从而显著增加训练难度并降低其性能。为解决此问题,本文提出一种新方法,从原始训练集中选择对噪声敏感的训练样本,构建规模更小但更有效的训练集用于模型训练。这些样本有助于提升模型在噪声环境中正确处理数据的能力。我们在三种经典GNN模型(GCN、GAT和GraphSAGE)及三个广泛使用的基准数据集(Cora、Citeseer和PubMed)上评估了所提方法。实验表明,与从原始训练集中随机抽取相同规模的训练集及原始完整训练集相比,该方法能显著提升图神经网络的训练效果。我们进一步提出了基于鲁棒节点的超图划分方法、基于对抗鲁棒性的图剪枝方法用于GNN防御,以及相关的谱边攻击方法。