Smart contracts are small programs that run autonomously on the blockchain, using it as their persistent memory. The predominant platform for smart contracts is the Ethereum VM (EVM). In EVM smart contracts, a problem with significant applications is to identify data structures (in blockchain state, a.k.a. "storage"), given only the deployed smart contract code. The problem has been highly challenging and has often been considered nearly impossible to address satisfactorily. (For reference, the latest state-of-the-art research tool fails to recover nearly all complex data structures and scales to under 50% of contracts.) Much of the complication is that the main on-chain data structures (mappings and arrays) have their locations derived dynamically through code execution. We propose sophisticated static analysis techniques to solve the identification of on-chain data structures with extremely high fidelity and completeness. Our analysis scales nearly universally and recovers deep data structures. Our techniques are able to identify the exact types of data structures with 98.6% precision and at least 92.6% recall, compared to a state-of-the-art tool managing 80.8% and 68.2% respectively. Strikingly, the analysis is often more complete than the storage description that the compiler itself produces, with full access to the source code.
翻译:智能合约是在区块链上自主运行的小型程序,将区块链作为其持久化内存。以太坊虚拟机(EVM)是智能合约的主流平台。在EVM智能合约中,一个具有重要应用价值的问题是:仅基于已部署的智能合约代码,识别其数据结构(即区块链状态中的“存储”)。该问题极具挑战性,常被认为几乎无法得到满意解决。(作为参考,当前最先进的研究工具未能恢复几乎所有复杂数据结构,且仅能扩展到不足50%的合约。)其主要复杂性在于,链上核心数据结构(映射和数组)的存储位置需通过代码执行动态推导得出。本文提出一套精密的静态分析技术,以极高的保真度与完整性解决链上数据结构的识别问题。我们的分析具备近乎普适的可扩展性,并能恢复深层数据结构。实验表明,该方法能以98.6%的精确度和至少92.6%的召回率准确识别数据结构类型,而当前最先进工具仅能达到80.8%和68.2%。值得注意的是,即使编译器拥有完整的源代码访问权限,本分析生成的存储描述往往比编译器自身输出的结果更为完整。