Large Language Models (LLMs) have transformed software development and automated code generation. Motivated by these advancements, this paper explores the feasibility of LLMs in modifying malware source code to generate variants. We introduce LLMalMorph, a semi-automated framework that leverages semantical and syntactical code comprehension by LLMs to generate new malware variants. LLMalMorph extracts function-level information from the malware source code and employs custom-engineered prompts coupled with strategically defined code transformations to guide the LLM in generating variants without resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse Windows malware samples of varying types, complexity and functionality and generated 618 variants. Our experiments demonstrate that LLMalMorph variants can effectively evade antivirus engines, achieving typical detection rate reductions of 10-15% across multiple complex samples. Furthermore, without explicitly targeting learning-based detectors, LLMalMorph attained attack success rates of up to 91% against a Machine Learning (ML) based malware detector. We also discuss the limitations of current LLM capabilities in generating malware variants from source code and assess where this emerging technology stands in the broader context of malware variant generation.
翻译:暂无翻译