Despite recent advances in fairness-aware machine learning, predictive models often exhibit discriminatory behavior towards marginalized groups. Such unfairness might arise from biased training data, model design, or representational disparities across groups, posing significant challenges in high-stakes decision-making domains such as college admissions. While existing fair learning models aim to mitigate bias, achieving an optimal trade-off between fairness and accuracy remains a challenge. Moreover, the reliance on black-box models hinders interpretability, limiting their applicability in socially sensitive domains. To circumvent these issues, we propose integrating Kolmogorov-Arnold Networks (KANs) within a fair adversarial learning framework. Leveraging the adversarial robustness and interpretability of KANs, our approach facilitates stable adversarial learning. We derive theoretical insights into the spline-based KAN architecture that ensure stability during adversarial optimization. Additionally, an adaptive fairness penalty update mechanism is proposed to strike a balance between fairness and accuracy. We back these findings with empirical evidence on two real-world admissions datasets, demonstrating the proposed framework's efficiency in achieving fairness across sensitive attributes while preserving predictive performance.
翻译:尽管公平感知机器学习领域近期取得了进展,但预测模型仍常对边缘化群体表现出歧视性行为。这种不公平可能源于有偏的训练数据、模型设计或群体间的表征差异,在大学录取等高风险决策领域构成重大挑战。现有公平学习模型虽致力于缓解偏差,但在公平性与准确性之间达成最优权衡仍具挑战性。此外,对黑盒模型的依赖阻碍了可解释性,限制了其在社会敏感领域的适用性。为规避这些问题,我们提出将Kolmogorov-Arnold网络(KANs)整合至公平对抗学习框架中。利用KANs的对抗鲁棒性与可解释性,我们的方法促进了稳定的对抗学习。我们从理论上分析了基于样条的KAN架构,该架构能确保对抗优化过程中的稳定性。此外,我们提出了一种自适应公平性惩罚更新机制,以平衡公平性与准确性。我们在两个真实世界录取数据集上提供了实证证据,证明所提框架在保持预测性能的同时,能有效实现跨敏感属性的公平性。