Workflow automation promises substantial productivity gains in everyday document-related tasks. While prior agentic systems can execute isolated instructions, they struggle with automating multi-step, session-level workflows due to limited control over the operational process. To this end, we introduce AutoDW, a novel execution framework that enables stepwise, rollback-enabled operation orchestration. AutoDW incrementally plans API actions conditioned on user instructions, intent-filtered API candidates, and the evolving states of the document. It further employs robust rollback mechanisms at both the argument and API levels, enabling dynamic correction and fault tolerance. These designs together ensure that the execution trajectory of AutoDW remains aligned with user intent and document context across long-horizon workflows. To assess its effectiveness, we construct a comprehensive benchmark of 250 sessions and 1,708 human-annotated instructions, reflecting realistic document processing scenarios with interdependent instructions. AutoDW achieves 90% and 62% completion rates on instruction- and session-level tasks, respectively, outperforming strong baselines by 40% and 76%. Moreover, AutoDW also remains robust for the decision of backbone LLMs and on tasks with varying difficulty. Code and data will be open-sourced. Code: https://github.com/YJett/AutoDW
翻译:工作流自动化在日常文档处理任务中具有显著提升生产力的潜力。尽管现有的智能体系统能够执行独立的指令,但由于对操作过程的控制有限,它们在自动化多步骤、会话级工作流方面面临挑战。为此,我们提出了AutoDW,一种新颖的执行框架,支持逐步式、可回滚的操作编排。AutoDW根据用户指令、经意图筛选的API候选集以及文档的实时状态,逐步规划API操作。该框架进一步在参数和API级别采用了鲁棒的回滚机制,实现了动态纠错与容错能力。这些设计共同确保了AutoDW在长周期工作流中的执行轨迹始终与用户意图及文档上下文保持一致。为评估其有效性,我们构建了一个包含250个会话和1,708条人工标注指令的综合基准测试集,反映了具有指令间依赖关系的真实文档处理场景。AutoDW在指令级和会话级任务上分别实现了90%和62%的完成率,较现有强基线方法提升了40%和76%。此外,AutoDW对不同骨干大语言模型的选择及不同难度任务均表现出良好的鲁棒性。代码与数据将开源。代码:https://github.com/YJett/AutoDW