Intrusion detection is a critical component of cybersecurity, responsible for identifying unauthorized access or anomalous behavior in computer networks. This paper presents a comprehensive study on intrusion detection in networks using classical machine learning algorithms applied to the multiclass version of the NSL-KDD dataset (Normal, DoS, Probe, R2L, and U2R classes). The characteristics of NSL-KDD are described in detail, including its variants and class distribution, and the data preprocessing process (cleaning, coding, and normalization) is documented. Four supervised classification models were implemented: Logistic Regression, Decision Tree, Random Forest, and XGBoost, whose performance is evaluated using standard metrics (accuracy, recall, F1 score, confusion matrix, and area under the ROC curve). Experiments show that models based on tree sets (Random Forest and XGBoost) achieve the best performance, with accuracies approaching 99%, significantly outperforming logistic regression and individual decision trees. The ability of each model to detect each attack category is also analyzed, highlighting the challenges in identifying rare attacks (R2L and U2R). Finally, the implications of the results are discussed, comparing them with the state of the art, and potential avenues for future research are proposed, such as the application of class balancing techniques and deep learning models to improve intrusion detection.
翻译:入侵检测是网络安全的关键组成部分,负责识别计算机网络中的未授权访问或异常行为。本文通过将经典机器学习算法应用于NSL-KDD数据集的多类别版本(正常、DoS、探测、R2L和U2R类别),对网络入侵检测进行了全面研究。详细描述了NSL-KDD数据集的特征,包括其变体与类别分布,并记录了数据预处理过程(清洗、编码与归一化)。研究实现了四种监督分类模型:逻辑回归、决策树、随机森林和XGBoost,其性能通过标准指标(准确率、召回率、F1分数、混淆矩阵及ROC曲线下面积)进行评估。实验表明,基于树集成的方法(随机森林与XGBoost)取得了最佳性能,准确率接近99%,显著优于逻辑回归与单一决策树。同时分析了各模型对每类攻击的检测能力,突出了识别稀有攻击(R2L与U2R)的挑战。最后讨论了研究结果的意义,与现有技术进行了对比,并提出了未来研究的潜在方向,例如应用类别平衡技术与深度学习模型以提升入侵检测性能。