一种在非欧几里得空间中执行协方差调整支持向量分类的算法 (An Algorithm to perform Covariance-Adjusted Support Vector Classification in Non-Euclidean Spaces)

Traditional Support Vector Machine (SVM) classification is carried out by finding the max-margin classifier for the training data that divides the mar-gin space into two equal sub-spaces. This study demonstrates limitations of performing Support Vector Classification in non-Euclidean spaces by estab-lishing that the underlying principle of max-margin classification and Karush Kuhn Tucker (KKT) boundary conditions are valid only in the Eu-clidean vector spaces, while in non-Euclidean spaces the principle of maxi-mum margin is a function of intra-class data covariance. The study estab-lishes a methodology to perform Support Vector Classification in Non-Euclidean Spaces by incorporating data covariance into the optimization problem using the transformation matrix obtained from Cholesky Decompo-sition of respective class covariance matrices, and shows that the resulting classifier obtained separates the margin space in ratio of respective class pop-ulation covariance. The study proposes an algorithm to iteratively estimate the population covariance-adjusted SVM classifier in non-Euclidean space from sample covariance matrices of the training data. The effectiveness of this SVM classification approach is demonstrated by applying the classifier on multiple datasets and comparing the performance with traditional SVM kernels and whitening algorithms. The Cholesky-SVM model shows marked improvement in the accuracy, precision, F1 scores and ROC performance compared to linear and other kernel SVMs.

翻译：传统的支持向量机（SVM）分类通过寻找训练数据的最大间隔分类器来实现，该分类器将间隔空间划分为两个相等的子空间。本研究通过论证最大间隔分类的基本原理及Karush-Kuhn-Tucker（KKT）边界条件仅在欧几里得向量空间中成立，揭示了在非欧几里得空间中执行支持向量分类的局限性：此时最大间隔原则成为类内数据协方差的函数。研究提出了一种在非欧几里得空间中实施支持向量分类的方法，该方法通过将数据协方差纳入优化问题来实现——利用从各类协方差矩阵的Cholesky分解中获得的变换矩阵，并证明所得分类器会按照各类总体协方差的比例分割间隔空间。本研究提出一种算法，能够基于训练数据的样本协方差矩阵，在非欧几里得空间中迭代估计总体协方差调整后的SVM分类器。通过将分类器应用于多个数据集，并与传统SVM核函数及白化算法的性能进行比较，验证了该SVM分类方法的有效性。相较于线性及其他核函数SVM，Cholesky-SVM模型在准确率、精确率、F1分数及ROC性能上均显示出显著提升。