Grouping the nodes of a graph into clusters is a standard technique for studying networks. We study a problem where we are given a directed network and are asked to partition the graph into a sequence of coherent groups. We assume that nodes in the network have features, and we measure the group coherence by comparing these features. Furthermore, we incorporate the cross edges by penalizing the forward cross edges and backward cross edges with different weights. If the weights are set to 0, then the problem is equivalent to clustering. However, if we penalize the backward edges, the order of discovered groups matters, and we can view our problem as a generalization of a classic segmentation problem. We consider a common iterative approach where we solve the groups given the centroids, and then find the centroids given the groups. We show that, unlike in clustering, the first subproblem is NP-hard. However, we show that we can solve the subproblem exactly if the underlying graph is a tree or if the number of groups is 2. For a general case, we propose an approximation algorithm based on linear programming. We propose 3 additional heuristics: (1) optimizing each pair of groups separately while keeping the remaining groups intact, (2) computing a spanning tree and then optimizing using only the edges in that, and (3) a greedy search moving nodes between the groups while optimizing the overall loss. We demonstrate with our experiments that the algorithms are practical and yield interpretable results.
翻译:将图的节点划分为簇是研究网络的标准技术。我们研究了一个问题:给定一个有向网络,要求将图划分为一系列相干组。我们假设网络中的节点具有特征,并通过比较这些特征来度量组的相干性。此外,我们通过以不同权重惩罚前向交叉边和后向交叉边来纳入交叉边的影响。若权重设为0,则该问题等价于聚类。然而,若惩罚后向边,则发现组的顺序至关重要,我们可以将本问题视为经典分割问题的推广。我们考虑一种常见的迭代方法:给定质心求解组,然后给定组求解质心。我们证明,与聚类不同,第一个子问题是NP难的。然而,我们表明若底层图是树或组数为2,则可精确求解该子问题。对于一般情况,我们提出一种基于线性规划的近似算法。我们额外提出三种启发式方法:(1) 在保持其余组不变的情况下分别优化每对组,(2) 计算生成树后仅使用该树中的边进行优化,以及(3) 通过节点在组间移动的贪婪搜索来优化总体损失。实验表明,这些算法具有实用性并能产生可解释的结果。