Nonnegative matrix factorization (NMF) is a linear dimensionality reduction technique for nonnegative data, with applications such as hyperspectral unmixing and topic modeling. NMF is a difficult problem in general (NP-hard), and its solutions are typically not unique. To address these two issues, additional constraints or assumptions are often used. In particular, separability assumes that the basis vectors in the NMF are equal to some columns of the input matrix. In that case, the problem is referred to as separable NMF (SNMF) and can be solved in polynomial-time with robustness guarantees, while identifying a unique solution. However, in real-world scenarios, due to noise or variability, multiple data points may lie near the basis vectors, which SNMF does not leverage. In this work, we rely on the smooth separability assumption, which assumes that each basis vector is close to multiple data points. We explore the properties of the corresponding problem, referred to as smooth SNMF (SSNMF), and examine how it relates to SNMF and orthogonal NMF. We then propose a convex model for SSNMF and show that it provably recovers the sought-after factors, even in the presence of noise. We finally adapt an existing fast gradient method to solve this convex model for SSNMF, and show that it compares favorably with state-of-the-art methods on both synthetic and hyperspectral datasets.
翻译:非负矩阵分解(NMF)是一种针对非负数据的线性降维技术,广泛应用于高光谱解混和主题建模等领域。NMF通常是一个困难问题(NP难),且其解通常不唯一。为解决这两个问题,常引入额外的约束或假设。特别地,可分离性假设NMF中的基向量等于输入矩阵的某些列。此时,该问题被称为可分离NMF(SNMF),可在多项式时间内求解并具有鲁棒性保证,同时能识别唯一解。然而,在实际场景中,由于噪声或数据变异,多个数据点可能分布在基向量附近,而SNMF未能利用这一特性。本研究基于平滑可分离性假设,即假设每个基向量接近多个数据点。我们探讨了对应问题(称为平滑SNMF,SSNMF)的性质,并分析其与SNMF及正交NMF的关联。随后,我们提出了一种SSNMF的凸模型,证明即使在噪声存在下该模型仍能可证地恢复目标因子。最后,我们采用现有快速梯度方法求解该SSNMF凸模型,并在合成数据与高光谱数据集上验证其性能优于当前主流方法。