Goal: We consider the problem of automatically grouping logs of runs that failed for the same underlying reasons, so that they can be treated more effectively, and investigate the following questions: (1) Does an approach developed to identify problems in system logs generalize to identifying problems in continuous deployment logs? (2) How does dimensionality reduction affect the quality of automated log clustering? (3) How does the criterion used for merging clusters in the clustering algorithm affect clustering quality? Method: We replicate and extend earlier work on clustering system log files to assess its generalization to continuous deployment logs. We consider the optional inclusion of one of these dimensionality reduction techniques: Principal Component Analysis (PCA), Latent Semantic Indexing (LSI), and Non-negative Matrix Factorization (NMF). Moreover, we consider three alternative cluster merge criteria (Single Linkage, Average Linkage, and Weighted Linkage), in addition to the Complete Linkage criterion used in earlier work. We empirically evaluate the 16 resulting configurations on continuous deployment logs provided by our industrial collaborator. Results: Our study shows that (1) identifying problems in continuous deployment logs via clustering is feasible, (2) including NMF significantly improves overall accuracy and robustness, and (3) Complete Linkage performs best of all merge criteria analyzed. Conclusions: We conclude that problem identification via automated log clustering is improved by including dimensionality reduction, as it decreases the pipeline's sensitivity to parameter choice, thereby increasing its robustness for handling different inputs.
翻译:目标:我们考虑自动分组运行日志的问题,由于同样的基本原因而失败,因此可以更有效地处理,并调查以下问题:(1) 是否为找出系统日志中的问题而制定了一种方法,以笼统地找出连续部署日志中的问题?(2) 维度减少如何影响自动日志群集的质量?(3) 组合算法中合并群集的标准如何影响集群质量?方法:我们复制并扩展先前关于分组系统日志的工作,以评估其普遍性,将之扩大到连续部署日志。我们考虑可选择纳入这些维度减少技术之一:主要构成部分分析(PCA),迟缓的语义指数(LSI),以及非负性矩阵质化(NMF)。此外,除了先前工作中使用的完整联系标准外,我们考虑三种替代性集群集合并标准(Single connectage, 平均链接, 和重力联系) 如何影响集群质量?我们从经验上评价了由工业协作员提供的连续部署日志的16种配置。结果:我们的研究显示:(1) 查明持续部署日志方面的问题,包括通过集成的精度增加的精确度,包括不断递增的精确度分析,我们通过推算,因此改进了所有MFRBILLLLLDR 。