一种基于EVM操作码特征的可解释人工智能模型用于检测恶意智能合约 (An Explainable AI Model for the Detecting Malicious Smart Contracts Based on EVM Opcode Based Features)

Hackers may create malicious solidity programs and deploy it in the Ethereum block chain. These malicious smart contracts try to attack legitimate programs by exploiting its vulnerabilities such as reentrancy, tx.origin attack, bad randomness, deligatecall and so on. This may lead to drain of the funds, denial of service and so on . Hence, it is necessary to identify and prevent the malicious smart contract before deploying it into the blockchain. In this paper, we propose an ML based malicious smart contract detection mechanism by analyzing the EVM opcodes. After balancing the opcode frequency dataset with SMOTE algorithm, we transformed opcode frequencies to the binary values (0,1) using an entropy based supervised binning method. Then, an explainable AI model is trained with the proposed binary opcode based features. From the implementations, we found that the proposed mechanism can detect 99% of malicious smart contracts with a false positive rate of only 0.01. Finally, we incorporated LIME algorithm in our classifier to justify its predictions. We found that, LIME algorithm can explain why a particular smart contract app is declared as malicious by our ML classifier based on the binary value of EVM opcodes.

翻译：黑客可能创建恶意的Solidity程序并将其部署在以太坊区块链上。这些恶意智能合约试图通过利用重入攻击、tx.origin攻击、不良随机性、delegatecall调用等漏洞来攻击合法程序，可能导致资金流失、拒绝服务等问题。因此，在将智能合约部署到区块链之前，有必要识别并预防恶意智能合约。本文提出一种基于机器学习的恶意智能合约检测机制，通过分析EVM操作码实现。在使用SMOTE算法平衡操作码频率数据集后，我们采用基于熵的监督分箱方法将操作码频率转换为二进制值（0,1）。随后，利用所提出的二进制操作码特征训练了一个可解释的人工智能模型。实验结果表明，该机制能以仅0.01的误报率检测99%的恶意智能合约。最后，我们在分类器中集成LIME算法以验证预测结果的合理性。研究发现，LIME算法能够根据EVM操作码的二进制值，解释为何特定智能合约应用被我们的机器学习分类器判定为恶意。