We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powerful agent frameworks or enhancing agent models, we identify a fundamental limitation: the semantic disconnect between how language models understand the world and how computer interfaces are structured. AIOS 1.0 addresses this challenge by transforming computers into contextual environments that language models can natively comprehend, implementing a Model Context Protocol (MCP) server architecture to abstract computer states and actions. This approach effectively decouples interface complexity from decision complexity, enabling agents to reason more effectively about computing environments. To demonstrate our platform's effectiveness, we introduce LiteCUA, a lightweight computer-use agent built on AIOS 1.0 that achieves a 14.66% success rate on the OSWorld benchmark, outperforming several specialized agent frameworks despite its simple architecture. Our results suggest that contextualizing computer environments for language models represents a promising direction for developing more capable computer-use agents and advancing toward AI that can interact with digital systems.
翻译:我们提出了AIOS 1.0,这是一个旨在通过环境情境化提升计算机使用代理(CUA)能力的新型平台。尽管现有方法主要侧重于构建更强大的代理框架或增强代理模型,但我们发现了一个根本性限制:语言模型理解世界的方式与计算机界面结构之间存在语义脱节。AIOS 1.0通过将计算机转化为语言模型能够原生理解的情境化环境来解决这一挑战,采用模型上下文协议(MCP)服务器架构来抽象计算机状态和操作。该方法有效解耦了界面复杂性与决策复杂性,使代理能够更有效地推理计算环境。为展示我们平台的有效性,我们引入了LiteCUA,这是一个基于AIOS 1.0构建的轻量级计算机使用代理,在OSWorld基准测试中实现了14.66%的成功率,尽管其架构简单,但性能优于多个专用代理框架。我们的结果表明,为语言模型情境化计算机环境是开发更强大计算机使用代理、推进人工智能与数字系统交互能力的一个有前景的方向。