Bayesian Non-Negative Matrix Factorization (NMF) is a method of interest in fields including genomics, neuroscience, and audio and image processing. Bayesian Poisson NMF is of particular importance for counts data, for example in cancer mutational signature analysis. However, MCMC methods for Bayesian Poisson NMF require a computationally intensive augmentation. Further, identifying latent rank is necessary, but commonly used heuristic approaches are slow and potentially subjective, while methods that learn rank automatically are unable to provide posterior uncertainties. In this paper, we introduce bayesNMF, a computationally efficient Gibbs sampler for Bayesian Poisson NMF. Metropolis-Hastings steps are used to avoid augmentation, where full conditionals from a Normal-likelihood NMF is used as geometry-informed, high-overlap proposals. We additionally define sparse Bayesian factor inclusion (SBFI) as a method to identify rank automatically while providing posterior uncertainty quantification. We provide an open-source R software package with all of the models and plotting capabilities demonstrated in this paper on GitHub at jennalandy/bayesNMF, and supplemental materials are available online. Although our applications focus on cancer mutational signatures, our software and results can be extended to any use of Bayesian Poisson NMF.
翻译:贝叶斯非负矩阵分解(NMF)是基因组学、神经科学以及音频与图像处理等领域备受关注的方法。其中,贝叶斯泊松NMF特别适用于计数数据,例如在癌症突变特征分析中。然而,针对贝叶斯泊松NMF的马尔可夫链蒙特卡洛方法需要计算密集的增广步骤。此外,确定潜在秩是必要的,但常用的启发式方法速度慢且可能具有主观性,而自动学习秩的方法则无法提供后验不确定性。本文提出bayesNMF,一种计算高效的贝叶斯泊松NMF吉布斯采样器。该方法采用Metropolis-Hastings步骤以避免增广,其中利用正态似然NMF的完全条件分布作为几何感知、高重叠的提议分布。我们还定义了稀疏贝叶斯因子包含(SBFI)作为自动识别秩并同时提供后验不确定性量化的方法。我们在GitHub(jennalandy/bayesNMF)上提供了开源R软件包,包含本文展示的所有模型和绘图功能,补充材料可在线上获取。尽管我们的应用聚焦于癌症突变特征分析,但本软件及结果可扩展至任何贝叶斯泊松NMF的应用场景。