数据量的减少使最低L值的计算效率提高 (Data Dimension Reduction makes ML Algorithms efficient)

Data dimension reduction (DDR) is all about mapping data from high dimensions to low dimensions, various techniques of DDR are being used for image dimension reduction like Random Projections, Principal Component Analysis (PCA), the Variance approach, LSA-Transform, the Combined and Direct approaches, and the New Random Approach. Auto-encoders (AE) are used to learn end-to-end mapping. In this paper, we demonstrate that pre-processing not only speeds up the algorithms but also improves accuracy in both supervised and unsupervised learning. In pre-processing of DDR, first PCA based DDR is used for supervised learning, then we explore AE based DDR for unsupervised learning. In PCA based DDR, we first compare supervised learning algorithms accuracy and time before and after applying PCA. Similarly, in AE based DDR, we compare unsupervised learning algorithm accuracy and time before and after AE representation learning. Supervised learning algorithms including support-vector machines (SVM), Decision Tree with GINI index, Decision Tree with entropy and Stochastic Gradient Descent classifier (SGDC) and unsupervised learning algorithm including K-means clustering, are used for classification purpose. We used two datasets MNIST and FashionMNIST Our experiment shows that there is massive improvement in accuracy and time reduction after pre-processing in both supervised and unsupervised learning.

翻译：减少数据维度(DDR)是指从高层面到低层面的绘图数据,复员方案的各种技术正在被用于图像维度的减少,如随机预测、主要构成部分分析(PCA)、差异方法、LSA-变换法、综合和直接方法以及新随机方法等。同样,在以AE为基础的复员方案中,我们比较未经监督的学习算法准确度和时间,在AE代表学习之前和之后,我们证明预处理不仅加快算法,而且提高了受监督和不受监督的学习的准确性。在复员方案的预处理中,首次以CPA为基础的复员方案用于监督的学习,然后我们探索以AE为基础的复员方案用于不受监督的学习。在以CPA为基础的复员方案中,我们首先比较受监督的学习算法的准确性和时间。在AE代表学习之前和之后,我们比较了未经监督的算法,包括支持-检验机器(SVMM)、带有GIN指数的决策树、带有昆虫和斯托氏级梯级梯级梯级梯级梯级梯级的DDDR(SG)和在应用的大规模学习时程中,我们用于学习的升级的升级和升级的KIS。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日