Machine learning (ML) is increasingly adopted in scientific research, yet the quality and reliability of results often depend on how experiments are designed and documented. Poor baselines, inconsistent preprocessing, or insufficient validation can lead to misleading conclusions about model performance. This paper presents a practical and structured guide for conducting ML experiments in scientific applications, focussing on reproducibility, fair comparison, and transparent reporting. We outline a step-by-step workflow, from dataset preparation to model selection and evaluation, and propose metrics that account for overfitting and instability across validation folds, including the Logarithmic Overfitting Ratio (LOR) and the Composite Overfitting Score (COS). Through recommended practices and example reporting formats, this work aims to support researchers in establishing robust baselines and drawing valid evidence-based insights from ML models applied to scientific problems.
翻译:机器学习(ML)在科学研究中的应用日益广泛,然而结果的质量与可靠性往往取决于实验的设计与记录方式。不充分的基线、不一致的预处理或验证不足可能导致对模型性能的误导性结论。本文为科学应用中的机器学习实验提供了一份实用且结构化的指南,重点关注可复现性、公平比较与透明报告。我们概述了从数据集准备到模型选择与评估的逐步工作流程,并提出了考虑过拟合与验证折间不稳定性的度量指标,包括对数过拟合比率(LOR)与复合过拟合分数(COS)。通过推荐实践与示例报告格式,本研究旨在帮助研究人员建立稳健的基线,并从应用于科学问题的机器学习模型中得出有效的基于证据的见解。