Generating thorough natural language explanations for threat detections remains an open problem in cybersecurity research, despite significant advances in automated malware detection systems. In this work, we present AutoMalDesc, an automated static analysis summarization framework that, following initial training on a small set of expert-curated examples, operates independently at scale. This approach leverages an iterative self-paced learning pipeline to progressively enhance output quality through synthetic data generation and validation cycles, eliminating the need for extensive manual data annotation. Evaluation across 3,600 diverse samples in five scripting languages demonstrates statistically significant improvements between iterations, showing consistent gains in both summary quality and classification accuracy. Our comprehensive validation approach combines quantitative metrics based on established malware labels with qualitative assessment from both human experts and LLM-based judges, confirming both technical precision and linguistic coherence of generated summaries. To facilitate reproducibility and advance research in this domain, we publish our complete dataset of more than 100K script samples, including annotated seed (0.9K) and test (3.6K) datasets, along with our methodology and evaluation framework.
翻译:尽管自动化恶意软件检测系统已取得显著进展,为威胁检测生成详尽的自然语言解释仍是网络安全研究中的一个开放性问题。本文提出AutoMalDesc,一种自动化静态分析摘要生成框架,该框架在少量专家标注示例的初始训练后,即可独立进行大规模操作。该方法采用迭代式自步学习流程,通过合成数据生成与验证循环逐步提升输出质量,无需大量人工数据标注。在五种脚本语言的3600个多样化样本上的评估表明,迭代间存在统计显著的改进,摘要质量与分类准确率均呈现一致提升。我们的综合验证方法结合了基于既定恶意软件标签的定量指标,以及人类专家与基于大语言模型的评估者的定性评估,证实了生成摘要的技术精确性与语言连贯性。为促进该领域的可复现性与研究进展,我们公开了包含超过10万个脚本样本的完整数据集,包括标注的种子数据集(0.9K)和测试数据集(3.6K),以及我们的方法论与评估框架。