Large language models (LLMs) become increasingly integrated into data science workflows for automated system design. However, these LLM-driven data science systems rely solely on the internal reasoning of LLMs, lacking guidance from scientific and theoretical principles. This limits their trustworthiness and robustness, especially when dealing with noisy and complex real-world datasets. This paper provides VDSAgents, a multi-agent system grounded in the Predictability-Computability-Stability (PCS) principles proposed in the Veridical Data Science (VDS) framework. Guided by PCS principles, the system implements a modular workflow for data cleaning, feature engineering, modeling, and evaluation. Each phase is handled by an elegant agent, incorporating perturbation analysis, unit testing, and model validation to ensure both functionality and scientific auditability. We evaluate VDSAgents on nine datasets with diverse characteristics, comparing it with state-of-the-art end-to-end data science systems, such as AutoKaggle and DataInterpreter, using DeepSeek-V3 and GPT-4o as backends. VDSAgents consistently outperforms the results of AutoKaggle and DataInterpreter, which validates the feasibility of embedding PCS principles into LLM-driven data science automation.
翻译:大型语言模型(LLMs)正日益融入数据科学工作流,以实现自动化系统设计。然而,这些由LLM驱动的数据科学系统仅依赖于LLM的内部推理,缺乏科学和理论原则的指导。这限制了其可信度和鲁棒性,尤其是在处理噪声大且复杂的真实世界数据集时。本文提出了VDSAgents,这是一个基于真实数据科学(VDS)框架中提出的可预测性-可计算性-稳定性(PCS)原则的多智能体系统。在PCS原则的指导下,该系统实现了数据清洗、特征工程、建模和评估的模块化工作流。每个阶段由一个精心设计的智能体处理,结合了扰动分析、单元测试和模型验证,以确保功能性和科学可审计性。我们在九个具有不同特征的数据集上评估了VDSAgents,将其与最先进的端到端数据科学系统(如AutoKaggle和DataInterpreter)进行比较,并使用DeepSeek-V3和GPT-4o作为后端。VDSAgents在结果上持续优于AutoKaggle和DataInterpreter,这验证了将PCS原则嵌入LLM驱动的数据科学自动化的可行性。