In data-driven applications relying on tabular data, where interpretability is key, machine learning models such as decision trees and linear regression are applied. Although neural networks can provide higher predictive performance, they are not used because of their blackbox nature. In this work, we present XNNTab, a neural architecture that combines the expressiveness of neural networks and interpretability. XNNTab first learns highly non-linear feature representations, which are decomposed into monosemantic features using a sparse autoencoder (SAE). These features are then assigned human-interpretable concepts, making the overall model prediction intrinsically interpretable. XNNTab outperforms interpretable predictive models, and achieves comparable performance to its non-interpretable counterparts.
翻译:在依赖表格数据且可解释性至关重要的数据驱动应用中,通常采用决策树和线性回归等机器学习模型。尽管神经网络能够提供更高的预测性能,但由于其黑箱特性,往往未被采用。本研究提出XNNTab神经网络架构,该架构融合了神经网络的表达能力与模型可解释性。XNNTab首先学习高度非线性的特征表示,随后通过稀疏自编码器(SAE)将其分解为单语义特征。这些特征被赋予人类可解释的概念,从而使整体模型预测具备内在可解释性。XNNTab在预测性能上优于现有可解释模型,并达到与非可解释模型相当的性能水平。