面向编译变异下二进制分解的基准构建研究 (Towards an Oracle for Binary Decomposition Under Compilation Variance)

Third-Party Library (TPL) detection, which identifies reused libraries in binary code, is critical for software security analysis. At its core, TPL detection depends on binary decomposition-the process of partitioning a monolithic binary into cohesive modules. Existing decomposition methods, whether anchor-based or clustering-based, fundamentally rely on the assumption that reused code exhibits similar function call relationships. However, this assumption is severely undermined by Function Call Graph (FCG) variations introduced by diverse compilation settings, particularly function inlining decisions that drastically alter FCG structures. In this work, we conduct the first systematic empirical study to establish the oracle for optimal binary decomposition under compilation variance. We first develop a labeling method to create precise FCG mappings on a comprehensive dataset compiled with 17 compilers, 6 optimizations, and 4 architectures; then, we identify the minimum semantic-equivalent function regions between FCG variants to derive the ground-truth decomposition. This oracle provides the first rigorous evaluation framework that quantitatively assesses decomposition algorithms under compilation variance. Using this oracle, we evaluate existing methods and expose their critical limitations: they either suffer from under-aggregation failure or over-aggregation failure. Our findings reveal that current decomposition techniques are inadequate for robust TPL detection, highlighting the urgent need for compilation-aware approaches.

翻译：第三方库（TPL）检测旨在识别二进制代码中重用的库，对于软件安全分析至关重要。其核心依赖于二进制分解——将单一二进制文件分割为内聚模块的过程。现有的分解方法，无论是基于锚点还是基于聚类，根本上依赖于一个假设：重用代码展现出相似的函数调用关系。然而，这一假设受到由多样化编译设置引入的函数调用图（FCG）变异的严重削弱，尤其是函数内联决策会显著改变FCG结构。在本研究中，我们进行了首次系统性实证研究，以建立编译变异下最优二进制分解的基准。我们首先开发了一种标注方法，在一个包含17种编译器、6种优化级别和4种架构的全面数据集上创建精确的FCG映射；然后，我们识别FCG变体间的最小语义等价函数区域，以推导出真实分解结果。这一基准提供了首个严格的评估框架，能够量化评估编译变异下的分解算法。利用此基准，我们评估了现有方法并揭示了其关键局限性：它们要么遭受聚合不足的失败，要么遭受聚合过度的失败。我们的研究结果表明，当前的分解技术不足以实现稳健的TPL检测，突显了开发编译感知方法的迫切需求。

相关内容