Multiple clustering aims at discovering diverse ways of organizing data into clusters. Despite the progress made, it's still a challenge for users to analyze and understand the distinctive structure of each output clustering. To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering. To this end, we provide a two-stage approach called MISC (Multiple Independent Subspace Clusterings). In the first stage, MISC uses independent subspace analysis to seek multiple and statistical independent (i.e. non-redundant) subspaces, and determines the number of subspaces via the minimum description length principle. In the second stage, to account for the intrinsic geometric structure of samples embedded in each subspace, MISC performs graph regularized semi-nonnegative matrix factorization to explore clusters. It additionally integrates the kernel trick into matrix factorization to handle non-linearly separable clusters. Experimental results on synthetic datasets show that MISC can find different interesting clusterings from the sought independent subspaces, and it also outperforms other related and competitive approaches on real-world datasets.
翻译:尽管取得了进展,但对于用户来说,分析和理解每个产出分组的独特结构仍然是一项挑战。为了缓解这一过程,我们考虑不同子空间内嵌入的不同组群,并分析嵌入子空间以显示每个组群的结构。为此,我们提供称为MISC(多独立子空间分组)的两阶段方法。在第一阶段,MISC使用独立的子空间分析来寻求多个和独立的统计(即非冗余)子空间,并通过最小描述长度原则确定子空间的数量。在第二阶段,我们考虑每个子空间内嵌入的样本的内在几何结构,MISC为探索各组群而执行图形化半同步矩阵因子化。此外,它将内核的戏法纳入矩阵因子化,以便处理非线性分离的集群。合成数据集的实验结果显示,MISC能够从所寻求的独立子空间内嵌入的不同有趣的组群体中找到不同的有趣组合体,并且它也超越了其他相关和竞争性的数据系统。