一检器通用：从PyPI到企业环境的恶意软件包鲁棒自适应检测 (One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises)

The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical challenges: robustness against adversarial source code transformations and adaptability to the varying false positive rate (FPR) requirements of different actors, from repository maintainers (requiring low FPR) to enterprise security teams (higher FPR tolerance). We introduce a robust detector capable of seamless integration into both public repositories like PyPI and enterprise ecosystems. To ensure robustness, we propose a novel methodology for generating adversarial packages using fine-grained code obfuscation. Combining these with adversarial training (AT) enhances detector robustness by 2.5x. We comprehensively evaluate AT effectiveness by testing our detector against 122,398 packages collected daily from PyPI over 80 days, showing that AT needs careful application: it makes the detector more robust to obfuscations and allows finding 10% more obfuscated packages, but slightly decreases performance on non-obfuscated packages. We demonstrate production adaptability of our detector via two case studies: (i) one for PyPI maintainers (tuned at 0.1% FPR) and (ii) one for enterprise teams (tuned at 10% FPR). In the former, we analyze 91,949 packages collected from PyPI over 37 days, achieving a daily detection rate of 2.48 malicious packages with only 2.18 false positives. In the latter, we analyze 1,596 packages adopted by a multinational software company, obtaining only 1.24 false positives daily. These results show that our detector can be seamlessly integrated into both public repositories like PyPI and enterprise ecosystems, ensuring a very low time budget of a few minutes to review the false positives. Overall, we uncovered 346 malicious packages, now reported to the community.

翻译：通过恶意Python软件包实施的供应链攻击日益增多，亟需鲁棒的检测解决方案。然而，现有方法忽视了两个关键挑战：对抗源代码变换的鲁棒性，以及适应不同参与者对误报率（FPR）的差异化需求——从要求低误报率的代码库维护者到容忍较高误报率的企业安全团队。我们提出了一种鲁棒检测器，能够无缝集成到PyPI等公共代码库和企业生态系统中。为确保鲁棒性，我们提出了一种利用细粒度代码混淆生成对抗性软件包的新方法。结合对抗性训练（AT）可将检测器鲁棒性提升2.5倍。我们通过连续80天对PyPI每日收集的122,398个软件包进行测试，全面评估了AT的有效性，结果表明AT需谨慎应用：它使检测器对混淆变换更具鲁棒性，能多发现10%的混淆软件包，但在非混淆软件包上的性能略有下降。我们通过两个案例研究展示了检测器的生产环境适应性：（i）面向PyPI维护者（误报率调至0.1%），（ii）面向企业团队（误报率调至10%）。在前者案例中，我们分析了37天内从PyPI收集的91,949个软件包，实现了每日检测2.48个恶意软件包且仅产生2.18个误报。在后者案例中，我们分析了一家跨国软件公司采用的1,596个软件包，每日仅产生1.24个误报。这些结果表明，我们的检测器可无缝集成到PyPI等公共代码库和企业生态系统中，确保仅需数分钟的时间成本即可完成误报审核。总体而言，我们发现了346个恶意软件包，并已向社区报告。