Large Language Model (LLM) powered agents have emerged as effective planners for Automated Machine Learning (AutoML) systems. While most existing AutoML approaches focus on automating feature engineering and model architecture search, recent studies in time series forecasting suggest that lightweight models can often achieve state-of-the-art performance. This observation led us to explore improving data quality, rather than model architecture, as a potentially fruitful direction for AutoML on time series data. We propose DCATS, a Data-Centric Agent for Time Series. DCATS leverages metadata accompanying time series to clean data while optimizing forecasting performance. We evaluated DCATS using four time series forecasting models on a large-scale traffic volume forecasting dataset. Results demonstrate that DCATS achieves an average 6% error reduction across all tested models and time horizons, highlighting the potential of data-centric approaches in AutoML for time series forecasting.
翻译:基于大型语言模型(LLM)的智能体已成为自动化机器学习(AutoML)系统中有效的规划工具。现有AutoML方法多聚焦于自动化特征工程与模型架构搜索,而近期时间序列预测研究表明,轻量级模型常能实现最先进的性能。这一发现促使我们探索将提升数据质量(而非优化模型架构)作为时间序列数据AutoML的潜在有效方向。我们提出DCATS——一种面向时间序列的数据中心化智能体。DCATS利用时间序列的元数据在优化预测性能的同时清洗数据。我们在大型交通流量预测数据集上使用四种时间序列预测模型对DCATS进行评估。结果表明,DCATS在所有测试模型与时间跨度上平均实现6%的误差降低,彰显了数据中心化方法在时间序列预测AutoML领域的潜力。