The uprising of deep learning methodology and practice in recent years has brought about a severe consequence of increasing carbon footprint due to the insatiable demand for computational resources and power. The field of text analytics also experienced a massive transformation in this trend of monopolizing methodology. In this paper, the original TF-IDF algorithm has been modified, and Clement Term Frequency-Inverse Document Frequency (CTF-IDF) has been proposed for data preprocessing. This paper primarily discusses the effectiveness of classical machine learning techniques in text analytics with CTF-IDF and a faster IRLBA algorithm for dimensionality reduction. The introduction of both of these techniques in the conventional text analytics pipeline ensures a more efficient, faster, and less computationally intensive application when compared with deep learning methodology regarding carbon footprint, with minor compromise in accuracy. The experimental results also exhibit a manifold of reduction in time complexity and improvement of model accuracy for the classical machine learning methods discussed further in this paper.


翻译:近年来深度学习方法和实践的兴起,由于对计算资源和算力的无尽需求,带来了碳足迹增加的严重后果。文本分析领域也在这种垄断方法论的潮流中经历了巨大变革。本文对原始的TF-IDF算法进行了改进,提出了Clement词频-逆文档频率(CTF-IDF)用于数据预处理。本文主要探讨了经典机器学习技术在文本分析中的有效性,结合CTF-IDF以及更快的IRLBA算法进行降维处理。在传统文本分析流程中引入这两种技术,与深度学习方法相比,在碳足迹方面确保了更高效、更快速且计算强度更低的应用,同时仅略微牺牲了准确性。实验结果还表明,本文进一步讨论的经典机器学习方法在时间复杂度上实现了数量级的降低,并在模型准确性上有所提升。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员